POCOHESI RESDHE 



W 005 575 

Selby* David 

I tern- lion response in the First Follow-Up Surrey of the 
Nationail Longitudinal Survey of the High School Class 
of 1972. 

Educational Policy Research Mater for Higher 
Education and Society , Washington* P.C- 
ttay 76 
300-76-0026 
1 81p* 

MF-*0-83 HC-$10* 03 Plus Postage* 

* Foil ova p Studies ; ^Graduate Surveys i High School 
Graduates; ^longitudiiml Studies | ^National Surveys; 
^Hesearch Problems; *lesponse Style {Testi) 
Missing lata; ^National longitudinal Survey Hi Sch 
Class 1972 



The paper describes a variety of analytical 
difficulties facing prospective users of the first follow-up of the 
Natioaal Center for Education statistics National Longitudinal Survey 
o£ the High School Class of 1972 (NL5) and suggests mom^ possible 
approaches to coping with these* The primary focus is on the causes 
and consegueuc^s of s^lecti^e item non^response in the first 
folloir^up survey* Coding schemes used to flag this non-response and 
alteraative approachis to estimating values for hissing data are 
discussed. Aft examinatioii of the special codes used for 
routiag^pa ttern errors and visaing data leads to the proposal of 
preparation of an analysis^orieiited data file to parallel, but not 
replace* the existing documentary file* Certain coding ^edifications 
are mentioned which sight be implemented for such a file* An 
examination of patterns of iten non^response leads to the conclusion 
that the guestionnaire'a content and fern at, especially requests for 
detailed and/or private information f coaplex routing patterns^ and a 
layout tetter suited to personal interviews than to nail^out 
collection, are probably responsible for some item Don-response * 
Possible moflif ications that might reduce item noji^response in future 
follcw^iip surveys are suggested. Review of several approaches to 
adjustment for missing data leads the authors to recommend a specific 
imputation procedure far data already collected, Also described are 
some possihie aethodologicai stadias aimed at testing the affects of 
data assignments upon characteristics of the present KLS data base. 
(author/RC) 

* Documents acquired by ERIC include many informal unpublished * 

* materials not available from other sources* EB3C makes evsry effort * 

* to obtain the best copy available. IfSFertheless, items o£ marginal * 

* reproducibility are often encountered aad this affects the guality * 

* of the microfiche and hardcopy reproductions BE1C oakes available * 

* tria the IHIC Document ieprodnation Service (EDBS) * 1DBS is not * 

* responsihle for the guality of the original document. prod u c t io.ns * 

* supplied by 1DRS are the best that cam be made froa the original. $ 



IB 133 334 

AUTHOR 
TITLE 



PUB DM£ 
CQSTBAC3 
NOTE 

EBBS PH3CE 
PESClXPaORS 



IDENTIFIERS 



AB5HULC5 



ERLC 



Joseph Froomkin Inc. 

IOI5 ElO HTSENtM Srmmi, N, W„ VVASHIN07ON. D.C. 8(5 <3 3 g 



UJ 



ITE M - NONR ES PONS E W THE PirST FOLIO "V- UP SEjR^E^ 
OF THE NATIONAL LOWQ iruXttNM Sl/UVWOF 
THE HIGH SCHOOL CM SS OP 1<P'IZ 



US 0 ,. f A.TNlHlOI MIfiUTH, 

w ■Due*iiw«!».!^;" I p 

KSTIOWLIHiinUTIOT 
fpueaTION 



David Selb5 
Joseph ProonikL* Ctic 



LTD 

to 
o 
o 



EPRC for Higher Education, arad Society 
Contract No, 30P , 76C026 
May 1976 

2 



ABSTRACT 



The paper describes a variety of analytical difficulties facing prospective 
users of the first follow-up of the NCES NLS survey and suggests some 
possible approaches to coping with these, 

The primary focus of this paper is on the causes and consequences of 
selective item non- response in the first follow -up survey, Coding schemes 
used to flag this non- response and alternative approaches to estimating 
values for missing data are discussed, 

An examination of special codes used for routirig-pattern errors and miss- 
ing data leads us to propose preparation of an analysis -oriented data file 
to parallel, tut not replace, the existing documentary file- We mention 
certain coding modifications which might be implemented for such a file. 

An examination of patterns of item non-response leads us to conclude that 
the questionnaire's content and format, especially requests for detailed 
and/or private information, complex routing patterns , and a layout better 
suited to personal interviews than to mail -out collection* are probably 
responsible for some item n on- response . We suggest possible modifica- 
tions that might reduce item non-response in future follow-up surveys . 

Reviev/ of several approaches to adjustment for miss lag data leads us to 
recommend a specific imputation procedure for data already collected . 
We also describe some possible methodological studies aimed at testing 
the effects of data assignments upon characteristics of the present NLS 
data base, 



3 



TABLE OF CONTENTS 



INTRODUCTION 1 

Overview of the paper, 2 

The issues In brief, 5 

Procedure. 8 

CODING 9 

Relation to item non-response, 9 

Description of special codes. 10 

ANALYTICAL DIFFICULTIES ARISING FROM ROUTING-ERROR 

CODES 13 

Incomplete "flagging." 13 

Response eligibility . 14 

Differing effects on calculated response rates , 17 

ANALYTICAL DIFFICULTIES ARISING FROM OTHER CODING 31 

Code 9S-BLANK, 31 

Response Option: "Does not apply," 36 

Code 93— PARTIAL RESPONSE , 37 
UncodaMe Responses - -OUT OF RANGE (95) and MULTIPLE 

RESPONSE (96). 39 

Comments on Category Labels « 39 

SUMMARY 41 

ITEM NON- RESPONSE 43 

Problems associated with item non-response, 43 

Patterns of item non-response. 44 

Summary. 68 

ASSIGNMENT OF DATA TO ADJUST FOR ITEM N ON -RESPONSE: 

ISSUES, METHODS, AND EXPERIMENTATION 72 

Introdu ct ion . 72 

Basic issues: an overview, 74 

Assignment practices; review and critique. 78 

Need for empirical studies of Item non-response bias, 109 



4 

ERIC 



TABLE OF CONTENTS (Cont'd) 



WORKS CITED 
APPENDICES: 

Table 2 BASIC ITEM CONTENT, BY RATE OF USABLE 
RESPONSE AND NUMERICAL SEQUENCE 

Table 3 RESPONSE DISTRIBUTIONS, SELECTED FIRST 
FOLLOW- UF AND BASE -YEAR ITEMS 

Tails 4 IMPACT OF ALLOCATION ON 1970 SCHOOL 
ENROLLMENT DISTRIBUTION 

Table 5 IMPACT OF ALLOCATION ON 1969 "WEEKS 
WORKED" DISTRIBUTION 

Table 6 IMPACT OF ALLOCATION ON 1970 EDUCATIONAL 
ATTAINMENT DISTRIBUTION 

Table 7 IMPACT OF ALLOCATION ON 1969 FAMILY 
INCOME DISTRIBUTION 



INTRODUCTION 



The National Longitudinal Study of the High School Class of 
1972 is an ambitious,, costly effort by the National Center for Educational 
Statistics to trace the careers of a cohort of young parsons during the 
.years following high school. The large sample selected (over 20 thousand 
participants) and the long questionnaires (some 50 pages In the original 
wave and the follow-ups) lead one to believe ttot a treasure- trove has 
been created for researchers interested In following up schooling deci- 
sions and career choices by young persons. 

The Educational Policy Research Center for Higher Educatiom 
and Society is especially interested in the insights which could be gained 
by analyzing this data. The topics covered are central to its mandate to 
develop policy -relevant information about the dynamics of choices to con- 
tinue one's education beyond high school » the ways education is financed, 
reasons for not continuing education beyond high school $ and the work 
experience of both those who stopped their education with high school and 
those who continued . 

Even a cursory examination of the summary of the first 
follow-up impressed staff members of the Center with the difficulty of 
proposing meaningful analyses of this information. The complexity of 
the questionnaire, the difficulty of tracing response patterns, and the 
rather uneven luck with obtaining information for selected questions has 
caused us to analyze carefully some of the possible pitfalls which lie in 

6 



the paths of analysts of the subject survey. 

The extended methodological note which follows should be - 
useful co users of the survey* the staff of NCES who may wish to com- 
mission various analyses of the data, and to planners of future large-scale 
surveys. We hope that it will stimulate an exchange between data users 
which will enhance the usefulness of the data, and will help them In 
economizing effort to obtain maximum results. 
Overview of the paper. 

The first follow-up survey of the National Longitudinal Study 
of the High School Class of 1972 (NLS-HS) covers the early postsecondary 
experience of the sample members* The data from that survey are 
flawed by high rates of non-response to many items. The problem is 
quite critical when response is very low on a very important item* For 
example i only some 60 per cent of those listed as eligible to answer the 
question gave the amount of their first-year expenditures for school tui- 
tion and fees , * 

Such gaps in information can seriously damage efforts to 
trace thb long-term school and work experiences of the 1972 graduates. 
It will be quite difficult to determine reliably what relationships exist 
between (a) the iase-yeai (pre-graduation) circumstances, (b) the early 
postsecondary school- and- work experiences, and (c) the later experiences 

* See Table 3, item F46BA. 

7 



qf the class of 1972 , The links between (a) and (c) will be especially 
hard to establish given poor information about the intervening period, 
Continued low item response rates in future follow-up surveys, coupled 
with normal sample attrition, will further aggravate analytical difficulties . 

Our analysis of the patterns of item non-response in the first 
follow-up survey has several objectives: 

1 , to determine what information is most affected 
by item non-response, 

2 1 to locate probable sources of item non -response, 

3 , to suggest possible ways of reducing item non- 
response in future follow-ups, and 

4 , to examine and assess various approaches to 
adjustment for missing data. 

We examined item response rates as published in the user's 
manual for the follow-up survey, We found that response rates were 
very low for several matters of the greatest policy importance. Among 
the most often omitted items were those covering income, financing 
education, and other "money matters; ,f reasons for past choices; future 
expectations; and, in general, details about ejcperience, 

We attribute much of the item non-response to the format 
and content of the questionnaire, especially to its complex routing patterns 
and a format ill-suited to self- administration* We also found that the 
coding scheme used to prepare the documentary data file (the ff data of 



s 



record") creates analytical difficulties, often spuriously inflates item 
non-response rates, and hampers identification of valid responses, 

We suggest possible revisions in the questionnaire 3 for use 
in future follow-ups * Since we presume its content to he justified by 
specific information needs, the suggestions are limited to matters of 
format and response options* We emphasize that our suggestions must 
be proven successful in field pre- tests before adoption for use in future 
surveys . 

We suggest the preparation of an analysis -oriented data file, 
paralleling the documentary file. For this effort, we recommend im- 
position of judgments about the validity of some questioned responses, 
assignment of values for missing data, and appropriate receding , We 
emphasize that the documentary file, and its present coding scheme, 
should be retained as the primary record of the first follow-up data... 

Our appraisal of various ways of assigning missing data leads 
to a recommendation that the method employed by the University of Michi- 
gan Institute for Social Research, for its "Panel Study of Income Dynamics/* 
is best suited for use with the NLS-HS data. We also recommend, how- 
ever, that empirical studies of the effects of data assignment upon the 
NLS-HS data base should be conducted under NCES auspices . 

This paper is organised in two major sections, The first 
contains our description and analysis of patterns of item non-response, 
The second contains our discussion of approaches to adjustment for 



ERLC 



missing data. 

The first section opens with a detailed discussion of coding 
problems, focusing particularly on the routing-error codes which created 
problems in calculating item response rates . The remainder of the first 
section contains our discussion of the kinds of information most affected 
by non-response and our (conjectural) analysis of the sources of non- 
response. 

The second section opens with a short commentary on pro's 
and con's of data assignment, especially with regard to longitudinal 
studies. Our review and critique of treatments for missing data in seven 
large data bases follows . The section closes with a brief discussion of 
the need for methodological studies to assess the consequences of data 
adjustment for the NLS-HS data base, and suggestion of some possible 
avenues of exploration for such studies, 
The issues in brief . 

As noted above, analysis of the survey is complicated by (1) 
high rates of non-response for certain items, (2) the use of data codes 
which make the computation of rates of non-response difficult* and (3) 
ambiguities in answers probably due to some design features of the 
instrument used to collect the data , 

Item non -response, which is apparently quite large for some 
items, is difficult to deal with in most complex analyses'. Selective non- 
response to particular items could be motivated by respondent 

10 



characteristics not measured by sample-selection variables, arid hence 
may require more complex adjustment procedures than those used to 
re -weigh the sample for questionnaire non-response, 

Adjustments of data for item non-response must rest on 
sophisticated guesses about what response would have been made had 
the respondent answered the question. Or, if an analyst decides not to 
modify the database! the power of results Is weakened, sometimes 
drastically, since the exact population to which generalizations can be 
made may be undefinable. Since conclusions may be affected by either 
course of action, decisions about data modification are important analytical 
issues , 

Coding issues stem from the rules devised for transforming 
raw Information -- as supplied by a respondent --into analyzable M data." 
Rules for coding normally are devised with some particular objective in 
mind, and are fundamental to the processing and analysis of data. The 
intended uses of the data govern the coding policies; that is, the coded 
data incorporated in a file are products of a chain of policy decisions, 
and these generally result from certain intentions and assumptions on the 
part of those who devise the coding rules. Coding issues may arise, for 
any data base, when the objectives of cooing are unclear or when differ- 
ent prospective users of the data base have different objectives for use 
of the data, 

The fundamental policy of R.T.I., which devised and 

11 

o 

ERIC 



implemented the coding rules for the MLS -MS data, was to avoid impos- 
ing judgments upon the data. They sought to retain as much of the diversity 
of "raw" responses as was consistent with the production of an interpret- 
able data file. As a rule, this is a good policy, and we emphasize that 
R .T.I. is to b applauded for adopting it. However, in implementing 
this general policy, R.T.I, devised a coding scheme which, we think, 
makes it difficult for analysts to use the data, 

Most important, the coding of non- response and errors in 
following routing patterns results in systematic misstatement of the pro- 
portions of usable response to items, This is a serious flaw because 
some prospective users may be dissuaded from attempting analysis, 
owing to artificially inflated non- response rates published in the user's 
manual, and because some people who were not eligible to answer certain 
items have been coded as ''eligible but not responding. " 

Questionnaire design is the paramount issue for future waves 
of the NLS-HS survey. The First Follow-up Questionnaire has proved 
to fall short of its intended purpose in several ways; this may be due to: 

(1) its physical layout, designed as if it were to be 
administered by a trained interviewer; 

(2) complex routing instructions, an important source 
of confusion to respondents in the self- adminis- 
tered instrument; 

(3) the response-options provided for many items 
which introduce unnecessary ambiguities for the 
respondent and probably underlie at least some 
of the skip-pattern errors; 

V2 



8 



(4) the lack of certain response options (chiefly 
1 'don't know 11 )! which probably induces much 
of the item non-response and causes loss of 
(fairly) firm estimates of the extent to which 
respondents ? ignorance of important matters 
may underlie their decisions, acts, and general 
experience; 

(5) the number of pages (and hence the apparent 
length of the questionnaire), which is increased 
by wasteful use of space, * Since the propen- 
sity to respond is doubtless affected by the 
recipients' initial impression of how long the 
questionnaire is (which might be judged from 
the number of pages), compact spacing through- 
out is advisable; 

(6) the booklet format of the questionnaire, which 
permits the respondent to enter at any point of 
his choice, Respondents who do not follow the 
prescribed item sequence, i.e., #1 toN, may 
well make skip -pattern errors and/or become 
so entangled in the various routing paths that 
they simply give up attempts to respond. 

Procedure, 



Since the technical discussion underlying the foregoing judg- 
ments about the methodological problems of the NLS-HS data base makes 
frequent and detailed reference to the M Base Year and First Follow-up 
Data File Users Manual" (K J.K, 1975) for the survey, we urge the 
reader to obtain a copy of that document for use in following the discussion , 

The technical discussion focuses on item non- response in 



* We presume the use of space was designed to ease processing of the 
questionnaire, an understandable objective, but not completely com- 
patible with self-administered data collection* 



13 



9 



the first follow-up survey. In the main* the base year survey is ignored 
because the mode of data collection used there will not be repeated arid 
because item response rates there are generally high. 

Our discussion covers only selected items from the First 
Follow-up Questionnaire, The items reviewed, we believe, are those 
relevant to policy Issues likely to be discussed by federal policy analysts . 
These are enrollment, financing of postsecondary education* labor force 
participation and income, and reasons for decisions about postsecondary 
schooling. Some descriptive items, such as marital status and family 
background* are also considered. Other items may present equally diffi- 
cult problems 9 but are not central to our interests , 

This discussion is based on analyses of response patterns 
derived from the item response distributions published by RTI in the 
User's Manual. A supplementary paper, based on an examination of in- 
dividual records from a special run of the public -use data tape, Is now 
in progress * * 

CODING 

Relation to item non-response , 

A discussion of coding problems must precede our comments 
on the core issue of Item non-response because RTFs coding scheme 



* The supplemental investigation is a cooperative effort between our 
group and the College Entrance Examination Board. 

ERiC 14 



10 



makes it very difficult to calculate accurate estimates of non -response, 
RTI has created a complex set of codes to represent the variety of cir^ 
cumstances under which a clearly usable answer was not obtained, The 
codes on certain questibns determine whether or not a person was counted 
among those eligible to answer other questions . Since the usable re*- 
spoase rate for any item is tased on the numier of people eligible to 
answer the question^ arid since the way codes are used can Inflate that 
base, calculated item non-response is very strongly influenced by the 
coding system. Mechanical application of decision rules, which exclude 
only cases carrying certain codes from among item eligibles, often leads 
to overstatement of the number of people eligible and, therefore* to 
imderstatement of the usable response rate, * 
Description of special cedes * 

To follow the technical discussion* the reader must be 
acquainted with certain RTI codes and their use, W^e describe them 
brieflyi but advise the reader to augment this by studying pages 22 through 
30 of the User's Manual, 

Routing-error "fla g M codes. Respondents to the First Fol- 
low-up Questionnaire were not required to answer every question* Cer- 
tain questions, called "routing items, ff direct the respondent to other 

* -As RTI indicates with "The effect of this coding for non -response is 
to overestimate the illegitimate non -response. . . This implies that 
the user should be quite careful in interpreting the /lion -response/ 
codes, Tf (User's Manual: 29-30) 



U 



questions which he should answer. Hits 10 d*one h"y use of M in struct, ion, 
"Skip to question — which is iceyed to orxe or more of the response 
options for the routing item. Ideally, the Jespondcnc should an^ w e^ the 
routing question and all questions to which ledbtyEhe Instr uction, 
but should not answer any questions he had£ beer^ routed arc*Uii<i. * In 
practice, respondeats often failed to meet this ideal jres popse pattern: 
forty- three per cent of the first follow - up respondents madle m least one 
erxor in following routing instructions * 

Routing errors occur when a p*stso»= answers ttie joiJUag 
question and then fails to follow the^Je]p f ^ itis^figfibii a»s cllrecc-aU There 
are several ways such failures can occur, and JRTlh^s devised £ series 
of M fiag f ' codes for routing-Items to indicate aha^t there i s ^omecJiing 
wrong with either the routing-item response m stJfas^queQt: responses f 

QuestlonaMe routing-iteni ans^wers fire flagged by adding 20, 
40, or 60 to the basic, response code, cL^p^ndlng; on toe cypfi of inconsist- 
ency. Twenty (20) is added for respondents whP ans^weired $Qm& subse- 
quent question they were directed to skip. Porcj C40> Is acJde< \^hen . 
suhsequent questions that were to be answered v^ere left bl^nk, Si^cty 
(60) was added for respondents who made footh sir cos » chat Is, ^fis^ered 
a question they were directed to skip arid f^iL^d to answer others ttaey" 

* Routing items may lead the ^espotideTit «itliej? toojr avoisod biotas of 
several questions. 

See User's Manual TaMe 6, ,f Quality- indices --BoUting Questions" 
(P. 30. ' • , 

ERIC 



12 



were to have answered* 

NQn-regp onse and unusable re sponse codes. Several special 
codes were used to mark absence of a usable response* Three separate 
kinds of noil- response are distinguished, and there are four codes for 
trnuB .able responses p 

The non^response codes are; 

(a) Code 99 (LEGirSKIP), used for irespondeiits who properly 
skipped an item they were routed around, as well as - 
for 1, 048 respondents to the base-year survey who re- 
tarried no follow-up InformatLoru 

(b) Cede 93 (P4RTLAL RESPONSE), used for non-response 
on a particular item,, which Is part of a set of related 
items* % when other items in the set were answered, 

(c) Code 98 (BLANK), used to code item non -response when 
neither code 99 nor code 93 apply, (The "residual' 
nature of this code underlies mamjr difficulties in calcu- 
lating Item response rates, as discussed below,) 

The tour unusable response codes are: 

(a) Code 94 (DON' T KNOW) 

(b) Code 95 (OUT OF RANGE) 



* For example, the item "Needed to earn money to support my family^ 
which is one of 17 reasons listed separately in item P24 (Why did you 
NOT continue formal education after high school), 

m 



13 



(c) Code 96 (MULTIPLE RESPONSE) 

(d) Code 97 (REFUSED ANSWER^ 

The uses of these "garbage" codes are fairly straightforward* although 
we comment below on the possibly misleading lajbeis, 

''Garbage" codes rarely apply to significant proportions of 
response and create few analytical difficulties v in contrast to the "flag" 
and "non-response 1 T codes, 

ANALYTICAL DIFFICULTIES ARISING FROM 
ROUTING-ERROR CODES 

The routing-error codes* as usedly RTL create analytical 
difficulties described below* 
Incomplete "flagging, " 

The simplest instance is the lack of routing-error "flags," 
comparable to those used for routing items , for conditional items, * In 
the published response distributions for conditional items, answers that 
are inconsistent with the routing- item responses are not distinguished 
from those that are consistent . This makes it impossible to count the 
number of * f clean M (certainly usable) responses directly from the distri- 
butions, To determine whether or not conditional- and routing-item 

* Conditional icems (questions) are those for which an answer is expected 
only on condition that a particular routing-item response was given. 
That is i conditional items are those to which a person is directed by 
the routing (or, SKIP) instructions keyed to routing- item responses. 



18" 



14 



responses are consistent, one must perform special computer rims in 
which both responses are compared, This adds a data -proces sing step 
which is costly, and provides extra opportunities for analysis error. 
The lack of f 'flags M for conditionals is particularly troublesome when 
a routing item centirols entrance to blocks of conditional items , In sucli 
cases ^ it may be necessary to check consistency for every item in the 
block in order to find the origln(s) of a routing- item f, tlag ,f code. 

The absence of M £lag ,f codes on conditional responses is a 
serious flaw, not only because it requires extra data processing but also 
because potential users cannot easily assess the adequacy of the data 
base by reference to the User's Manual. Some studies may not be under- 
taken because prospective analysts are dissuaded from attempting thenu 
Response eligibility, 

A far more serious analytical difficulty created by the routing- 
error codes is inflation of the number of people counted eligible to answer 
conditional items. As previously noted, this affects calculated Item re- 
sponse rates * sometimes quite markedly, 

RTfs published distributions count as "eligible to answer" 
all those not coded 99 (LEGITSKIP); that is, unless definitely ruled out 
of the eligible pool! a person is deemed eligible. The number oi LEGIT- 
SKIPs for conditional items is determined by subtracting only the "clean >j 
SKIP "responses * from the total sample size* Any routing-item responses 

* Plus the constant 1,048 people who returned no questionnaire. 



I..19 



la 

which are error-coded are, therefore! deemed! eligible for subsequent 
conditional items , 

To illustrate' 

Item F23 asks "Since leading high school, have you 
attended any school. , *V Response options are YES and NO* If YES is 
marked* the respondent is directed to "SKIP to q, 25, ,f If NO is marked? 
he is expected to answer items F24A through F24Q (which are a list of 
reasons for not continuing formal education) and to ©Kit Section EK * 

By RTI f s procedure, only those who gave a "clean" NO to 
F23 are ruled ineligible CLEGirSKIP) to answer item F25, which asks 
"Were you taking classes or courses at any school during the first week 
of October, 19737" There were 5,447 "clean" NOs to item F23, and the 
published LEGrrSKlF for item F25 is 6 S 495* or 5, 447 -h 1,048. There 
were, however, a total of 2, 360 additional HQs bearing routing-error 
"flag" codes, and none of these are ruled ineligible to answer item F25. 

Since they are not ruled out of the eligible pool for P25, 
they are treated as eligible, There were 776 cases coded 42 on F23 
(NO to F23 and failed to answer any items in F24A-F24Q ** and properly 
exited from Section B* as directed In P24)* These seem clearly ineligible 
to answer further questions about postsecondary schooling, yet were 
counted eligible for FZS—as well as all further items in Section B" 

* Section B is that portion of the questionnaire which deals with post- 
secondary education and training, 

Items F24A through F24Q are a list of reasons for not continuing 
formal education after high school. 



16 



simply because of the routing-error code for item F23. 

The use of code 42 for item F23 indicates that these 776 
people answered no subsequent questions in Section B* * They must, 
therefore, have been included among those coded 98 (BLANK), for all 
remaining items in Section B. Since they contribute to the eligible pool, 
but not to the usable response pool/ for all these items, their dubious 
inclusion will decrease the apparent usable response rates , 

The rates can be calculated, of course, after subtraction 
of 776 from the eligible pool for any item from F2S through F47GB (the 
end of Section B) r but it seems undesirable that such an adjustment must 
be made to compensate for the vagaries of response coding. 

This case Illustrates one of the simpler difficulties which 
arise from the routing error codes * Where a sequence of routing -items 
precedes a conditional item! adjustment of the eligible pool to compen- 
sate for such dubious inclusions requires extensive computer analysis 
of response patterns* The supplementary paper mentioned above will 
present an attempt to perform such an adjustment, ** It will illustrate 
the difficulties faced by analysts as a result of routing-erred* coding. 



* See the listing for Q23, code 42, and the footnote, on page 1 of Appendix 
E, 1, User's Manual, 

** For a small block of 7 items covering schooling costs for the first year 
after high school, which are near the end of questionnaire Section B, 

21, 



17 



Differing effects on calculated respons e rates . 

All of the routing -error codes affect calculations of Item 
non-response from the published distributions . Since the "flag" codes 
have different meanings , their impact on estimated Item non-response 
will vary. In some cases, inclusion of "erroneous" responses among 
the eligLbles can increase the estimated item response rate, in oners 
(as in the illustration above) it will decrease the rate. 

Working only with the distributions published in the User's 
Manual, we have tried to assess the influence of various error -codes 
on response rates. For selected routing items, we related the number 
of each kind of routing error to the proportion of BLANKS for subsequent 
conditional items . Our objective was to determine how many of the 
BLANK responses might have been contributed by people who erred In 
following the routing pattern. This effort yields some suggestions for 
modifying the data base to reduce the analytical difficulties posed by the 

routing -error codes . 

Table 1 shows the contribution of error- coded responses 
(to each routing Item) to total response for the routing item and its first 
conditional item. * The content of the entries varies by code because of 
the different possibilities for contribution entailed by each code, ** 

* That is, the next item in the numerical sequence. 

** Refer back to page 11 for the explanation of these codes , 

22£ 



18 



TABLE 1 

ROUTING- ERROR CASES A3 A PROPORTION OF ROUT1NQ- 
AND CONDITIONAL-ITEM ELIGBLES 



Routing 
Item 

_ No. 

F 2: 
F 7A: 
F 8A: 
F 13B: 
F 21: 

F23: 

F 25: 
F 28B- 

F 29A: 
F 30: 

F 48A: 
P 54A* 



Concent 

Complete high school? 

Marital Status 10/73 

Number of children, if any 

Anyone discuss borrowing? 

Participated in a training 
program since high school? 

Any kind of schooling since 
high school? 

Attending any classes 10/73? 

Field of study 10/73 academic 
or vocational? 

Attending any classes 10/72? 

School 10/72 same as school 
10/73? 

Working 10/73? 
Working 10/72? 



Number 

of Cases 

14 

251 
52 
21 

268 

1,299 
226 

163 
449 

435 
285 
361 



Code 20 
Per Cent of Item l-lij'.ibios 



Routing 

jtern 

0.07 
1,2 
0.9 
0,1 

1,3 

6,1 
1,4 

1.3 
2,8 

3.1 
1.3 
1.7 



ConUitiohul 11 
Item 

0.07 

4.1 

1.4 

0.3 

5.5 

16.0 
1.9 

1.4 
8.9 

8.9 ' 

3.5 

3.6 



a "Conditional item" Is the first item conditioned on the routing Item; i.e. , the next 
item in the numerical sequence as given in the User's Manual distributions . % 



19 



TABLE 1 (Cont'd) 

ROtJTNG-ERROR CASES AS A PROPORTION OF RQUTING- 
AND CONDITIONAL- ITEM ELIG1BLES 



Code 40 



Routing 
Item 
No. 


Concent 


Number 

of Cases 


Per Cent of Item Eligibles 
Routing Conditional* 1 
Item Item 


Total Non- 
Rgsponse Rflte-- 
Comlitional 
ItemCsyb 


P2: 


Complete high school? 


1,727 


8.1 


8.1 


19.1 


F 7Ai 


Marital status 10/73 


101 


0.5 


1.7 


44,0 


F 8A* 


Number of children, if any 


37 


0,6 


1.0 




F 13B: 


Anyone discuss borrowing? 


183 


0.9 


2.4 


37.3 


B2U 


Participated in a training 
program since high school? 


54 


0.3 


1.1 


26.7 


F23: 


Any kind of schooling since 
high school? 




3.7 


9.6 


20.4 


F25: 


Attending any classes 10/73? 


47 


0.3 


0.4 


26.2 


F 28B: 


Field of study 10/73 academic 
or vocational? 


394 


3.2 


3.3 


27.5 


p29Ai 


Attending any classes 10/72? 


305 


1.9 


6.0 


46.4 


F30: 


School 10/72 same as school 
10/737 


357 


2.5 


7.3 


72.4 


F 48A: 


Working 10/73? 


602 


2.8 


7.5 


( 11.3 d 


F 54A: 


Working 10/727 


' 1,164 


5.4 


11.7 


(18.2= 
( 13.3 d 



a ''Conditional item" is the first item conditioned on the routing item; i.e., the next item in 
the numerical sequence as given in the User's Manual distributions, 

h Where more than one conditional item is contained in the skip pattern* based on averages 
for all. Where ''Partial Response" is among categories, figure shown is the sum of BLANK 
and PARTIAL RESPONSE. 

c Average for "Reasons for not working'' set, excluding "'Going to school," 
d Rate for "Looking for work, j V item 48C or 54C . 



24 



20 



TABLE 1 (Cont'd) 

RGUTIN0-BRROR CASES AS A PROPORTION OF ROUTING- 
AND CONDITIONAL-ITEM ELIGIBLES 



____ C ode 60 

Routing " ~ Per Cent: oi Item HligiW 

^ g rn . Number Routing^ 

No, Concent of Cases item 

F 2: Complete high school? 0 0 

F 7A: Marital status 10/73 0 0 

F 8A: Number of children, if any 0 0 

F ISBr Anyone discuss borrowing? 0 0 

F 21 : Participated in a training 

program since high school? 0 0 

F 23: Any kind of schooling since 

high school? 521 2,4 

F 25: Attending any classes 10/73? 0 0 

P 28B: Field of study 10/73 academic 

or vocational? 0 0 

F 29A: Attending any classes 10/72 ? 230. 1,4 

F 30: School 10/72 same as school -■■ 

10/73? 0 0 

F48A: Working 10/73? 83 0,4 

F54A: Working 10/72? 123 0,6 



e 

See text discussion regarding omission of "proportion of conditional-item eligibies. " 



■25 



Code 20 implies that a response was given, In the table, its 
contribution to the pool of conditional -item eligibles is also, therefore, 
Its contribution to the published response rate, * 

Code 40 implies that no response was given. ** Its contribu- 
tion to the conditional -item eligible pool is also, therefore, its contribu- 
tion to non-response. We show the noa-response rate for conditional 
items for comparison with the contribution of "erroneous* 1 responses to 
the conditional -item eligible pool, 

Code 60 designates a combination of routing errors , For 
reasons to be discussed, we omit its contribution to conditional items. 

We can compare the contributions* to routing and conditional 
items, made by those who erred in following routing instructions, This 
gives us some notion of the impact of each error code on conditional re- 
sponse rate. 

Code 20 f s contribution to analysis difficulties . Consider code 
20 for item F23 in the table. The 1,299 people whose response to F23 
was questioned because of later inconsistent responses were only six 
per cent of all those eligible to answer F23, These same people can 

* For the present purpose, we have assumed that the questioned re- 
sponse was given for the first conditional item , This is not necessarily 
true, since code 20 implies at least one erroneous response somewhere 
among several conditional items, 

** Where there are blocks of conditional items, code 40 means none were 
answered. 



22 



account for sixteen per cent of those answering the conditional item (F24: 
Reasons for not continuing education after high school). Their dispropor- 
tionately large contribution to the eligible pool (and the response rate) for 
F24 shows that a small minority of erring respondents to F23 contributed 
heavily to the responses for F24. If these cases were ruled out of the 
data base for F24, * the size of the eligible pool would drop from 8,118 
to 6,819 and the usable response rate for item F24A would drop from 79,8 
per cent (6,481) to 76,0 per cent (5, 182). If they were retained in the 
eligible pool for F24A, but dropped from the usable responses, the usable 
response rate would drop from 79*8 per cent to 63*8 per cent. Obviously, 
where routing- item code 20s make a disproportionately large contribution 
to the conditional -item eligible pool, they have a significant impact on 
the calculated response rates. 

The analyst's interpretation of the questionable responses 
can exert an important influence on his results, One cannot be sure which 
of the two inconsistent responses (routing- or conditional-item) is true, 
Therefore, some decision must be made by the analyst, but whatever 
decision he makes will affect response and non-response rates. As we 
have just shown, complete elimination of code 20 cases from the conditional- 
item eligible pool will result in a decreased response rate, This happens 
because the number of BLANK cases remains constant but constitutes a 



* This is not a recommendation. However, we suppose some analysts 
might wish to work only with unquestionably valid responses. 



23 



larger proportion of the reduced eligible pool. Conversely, retention of 
code 20a will artificially increase the usable response rate by the extent 
to which genuinely erroneous conditional item responses are represented 
among those coded 20 on the routing item, * 

To further compound the ambiguity surrounding decisions 
about inclusion or exclusion of code 20s, it can happen that a code 20 on 
the routing item may result from a "garbage coded" response to a con- 
ditional item. In such a case, the number of usable responses will not 
be increased, but the eligible pool will, and the usable response rate 
will be somewhat reduced. In the first follow-up survey, this seems to 
be an exceptional case, but we comment on it later In remarks about 
code 94 (DON'T KNOW) , 

To this point in the discussion of code 20 responses, we have 
considered only the case where an inconsistent response is made, i.e. , 
some actual answer is given. But those coded 20 on a routing item need 
not have answered all conditional items in a block of related items.. A 
flag code 20 was assigned if there was at least one inconsistent response 

following the routing item . 

Thus, in our example above, some of those coded 20 on item 
F23 might have answered (say) item F24B, but not F24A. In that case, 

* We assume that some of those coded 20 on the routing item erred in 
marking the routing item, and that others erred in marking the con- 
ditional item; which response is really erroneous is not certain. 

ERIC m 



24 



they are considered eligible for F24A but coded as a PARTIAL RESPONSE 
(code 93). * 

In discussing the example, we said that inclusion of code 20s 
would increase the "usable 1 ' response rate, and decrease the non-response 
rate for a conditional item. Now we must modify that statement. In the 
situation we are now considering, the code 20s for F23 become code 93s 
(PARTIAL RESPONSE) for item F24A, and, of course, the result is to 
increase the non- response rate while decreasing the usable response rate. 

The analytical difficulties presented by the code 20s are mind- 
boggling. Consider, for example, what we conceive as a "worst case" 
situation: An analyst is interested in certain attributes of people who 
claimed they stopped their education with high school because their plans 
did not require more education. Item F24L is his key selection item, 
because it gives that reason, 

On the basis of the User's Manual distribution, he finds that 
the usable response rate is 79,6 per cent, and that total non-response 
(BLANK plus PARTIAL RESPONSE) is 20,2 per cent, He further sees 
that the people of interest to him (those answering "applies to me" con- 
cerning the stated reason) number 2,729 cases, and that there are only 
17 "garbage code" cases. He rules out the 3,730 who answered "does 



* Despite its label, code 93 means a form of non -response, as described 
above, 

** See item F24L in our Table 3 appended) s The other 0,2 per cent is 
accounted for by other codes and rounding error. 

29 



25 



not apply to me" (since they lack the controlling characteristic), and 
must decide what to do about the 1,642 non - respondents , 

We know that there are 1,299 "code 20" respondents spread 
somewhere throughout this distribution, none of whom is coded either 
BLANK (98) or LEOITSKIP (99). * But because there are no "flag" codes 
for routing-item errors on item F24L, neither we nor the prospective 
analyst know how these doubtful responses are scattered among the coding 
categories . They might all be among the people of interest ("applies to 
me"), or all might be loaded on other response codes, or (more likely) 
they may be variously distributed over all codes other than 98 and 99. 

Depending on the actual distribution of the code 20s among 
the responses to F24L (which, recall, can include the PARTIAL RESPONSE 
code 93) and our analyst's decision about whether the code 20s are or are 
not valid responses, the actual number of cases available for his study 
could be as many as 2,729 (all "applies to me" responses) or as few as 
1,430. ** 

We need not carry this "worst case" illustration further, 
since the analytical difficulties faced by our fictional analyst must be 

* Code 20 in F23 guarantees this, by its definition. See User's Manual 
Appendix B.l, Q23, codes 21 and 22. 

** Most of those coded 20 (1,079) gave a NO response to item F23 (Any 
school after high school?), and should have answered F24L on that 
account. However, the truth of their response to F23 is in doubt be- 
cause of later responses which suggest they had some postsecondary 
education; hence their eligibility for F24L, and the hypothetical study, 
is in doubt . g q 



26 



evident. Tracing the F24L responses of those 1,299 doubtful cases will 
require several crucial analysis decisions, hours of programming prepa- 
ration, substantial computer costs, and possession of the data tape. All 
of this must be done before the researcher can even decide whether to 
go ahead with his projected study. 

These illustrations of the possible difficulties stemming from 
code 20 on item F2S by no means exhaust the matter. The reader can 
examine other routing items in. Table 1 to see the number of equally diffi- 
cult instances implied by the proportions of conditional -item eligibles . 
We think that where the figures in the paired cells differ markedly, the 
analyst will face trouble, Seven of the twelve routing items listed in 
Table 1 fit this criterion, and these seven items directly affect a total 
of 234 other items in the First Follow-up Questionnaire, * 

What is to be done about code 20s? Having elaborated the 
analytical difficulties posed by RTI f s use of code 20, we feel obligated 
to suggest some remedy for the situation. Our first thought was that 
code 20 cases on critical routing items (like F23: Any school after high 
school) might be deleted from the follow -up data base. This seems 
impractical, however, because to delete only F23 code 20s would shrink 
the data base by 6 per cent, and the cumulative effect of dropping others 

* By count of the items within the routing patterns of items F7A, F21, 
F23, F48A, and F54A. Items F29A and F30 are coupled, as screens, 
with F23. See User's Manual Appendix E.2. 



27 



would reduce it still farther, 

Instead, we recommend imposing more judgments about the 
validity of 20-coded responses. We think it possible to estimate the 
probable truth or falsity of such responses by examining subsequent 
response combinations, * Such a judgmental reassessment of routing- 
item ,f error" responses can lead to reclassification of responses, largely 
eliminating code 20s as a response category but not removing them from 
the data base, 

In the later discussion of questionnaire format as a source 
of data problems, we suggest some ways to forestall the occurrence of 
code 20 cases in future waves of the survey* Some of these involve 
changes in the physical layout of the questionnaire! others involve changes 
in the response options to various items, ** 

Code 4Q f s contribution to analysis difficulties, We have al- 
ready given one example of the impact of routing-error code 40. *** We 
showed that 776 people so coded for item F23 (Any schooling after high 
school) were inappropriately carried through to the eligible pools for 

* We have used such a procedure in the recomputation of eligibles for 
items 46A and B (first year school costs), the results of which will 
be described in the supplemental paper now in progress. 

We are aware that such changes may make data non -comparable 
across survey waves. We discuss this matter in the section on 
formatting, 

*** Failure to answer conditional items, 

32 ! 



28 



every item from F25 to F47GB. 

The use of code 40 will not always cause such harm, When 
routing instructions are used to shunt some people around one or a few 
items in a sequence which is otherwise applicable to all, the eligible 
pools will not be unduly inflated, For example! item F7A (What was 
your marital status , as of the first week of October 1973?) routes the 
never -married around questions about the date of marriage and whether 
or not the respondent had any children, * All respondents are then ex- 
pected to answer the next question (F9; In October 1973, were you 
financially dependent, , A code 40 on item F7A indicates that no in- 
formation was given in the conditional items , but those so coded are 
obviously eligible to answer them and to answer later questions. Calcu- 
lations of item response rates are not affected in such a case, 

As we see it, no useful information is added by code 40, 
since it does not appear to flag a genuine routing pattern error . Without 
this flag, those cases would still be in the eligible pool for conditional 
items to which the "erroneous" response directs them, and would be 
counted BLANK for those items. 

But as we have shown, the absence of this flag would prevent 
false inclusion of the 40-code cases in the eligible pools for items they 

* Possibly unwisely in the latter case, since parenthood does not require 
marriage and the estimated illegitimacy rate now runs to about 13 per 
cent of all U.S. births, Responsibility for children, legitimate or not, 
doubtless affects decisions about work and school. 



are not eU^tyleto answer, So, code 40 appears to contribute only mis- 
chief and cj^- 'processing confusion, 

Yhe data in Table 1 reinforce our belief that code 40 should 
be eUmiMtfOA. The percentages there show that code 40s consistently 
contribute P> teffge share of the conditional -item eligible pools. Although 
in several P%h.e& they account for almost half of the total non-response for 
conditional fy^nis, they are always well within the general pattern of non- 
response. \$ other words, when the use of code 40 is not doing harm, it 
adds nothing Wour understanding of conditional responses. 

recoinmend discontinuation of the 40-code flag, 

^ode GO'S c ontribution to analysis difficulties. As shown in 
Table 1, r^'O'iiaents with mixed routing errors (code 60s) make up a 
rather sm^H part of the data base. Even If there is no overlap among 
those makif^ $ Hch errors (an unlikely event), they would make up no 
more than 4 , & per cent of all respondents . 

\i there is no overlap among respondents making code 60 
errors on ft$ ctucial status items* listed in Table 1, deletion of all such 
cases would £e4uce the available data base to 20,393, for an overall 
follow-up ^#^«nse rate of 91 per cent. If there is complete overlap 
(i,e, , all ^t4Ps«quent code 60 errors were made by those so coded for 

* F23- i\<$t s<3iooling after high school; J?29A: Enrolled in October 
1972; Pv4i4: Did you hold a job, first week October 1973; F54A: Did 
you hoy during October 1972? 

34 



ou 

F23), the data base would be reduced to 20, 829, for an overall response 
rate of 93 per cent. Since there is probably partial overlap, the true 
effect would be to reduce the overall response rata to something between 
91 per cent and 98 per cent. 

Since there must be almost total uncertainty about the true 
enrollment or work status of code 60 respondents, their information must 
be considered highly unreliable and probably should aot be used. Consider 
ing the reduction of analytical difficulties to be gained by deleting these 
cases, and the fact that an overall response rate of 91 per cent or better 
would be quite respectable, we think it advisable to rid the data base of 
these highly ambiguous cases. 

Clo sing comme nts on routing-error codes, Our suggestions 
for treatment of the routing -error responses require a degree of willing- 
ness to intervene in the data which RTI rightly abjured. It is emphasized 
that we recommend such intervention only if NCES wishes to make avail- 
able a parallel analysis tape, on an optional basis, to prospective users. 
In no event should the RTI documentary version be replaced by a modified 
data tape, since some users may prefer other treatments, The treat- 
ments thus far recommended would serve only to reduce the amount of 
ambiguous data and eliminate sources of analytical difficulty. They will 
not supply missing data for n on -respondents , but they will help to fix 
more accurately the number of persons eligible to answer given items, 
and they should eliminate most of those cases for which data may be 



31 



supposed unreliable owing to respondent inability or unwillingness to 
follow instructions. 

ANALYTICAL DIFFICULTIES ARISING FROM OTHER CODING 

The influence of the routing-error codes upon the calculation 
of LEGITSKIP and BLANK is the major coding source of analytical diffi- 
culties, but there are others. This section is a rundown of miscellaneous 
observations about how the code structure might be altered to make anal- 
ysis easier* 
Code 98 -BLANK, 

We think that frequency counts listed as BLANK are inflated 
by factors other than the mechanical inclusion of routing-error codes. 
These are chiefly by-products of the instructions and response options 
given in the questionnaire. 

An example of one extreme case will give the reader an idea 
of how BLANK counts can be inflated by such factors. 

Items FUBt D, F* and H ask for information about the 1973 
income of the respondent's spouse. The exact wording of item Fll is: 
"What is the best estimate of your income before taxes for all of 1973? 
If you are married, please estimate your husband's or wife's income in 
the second column provided. Do not include loans or gifts , " Below the 
question is a list beginning with total income, then seeking source details: 
"from wages, salaries, , , , , " "scholarships, fellowships, ,f and "other 
(for example, interest)." Two response columns are supplied, one 

q 36 ; 

ERIC 



32 



headed ff Your Own Income," the other H Your Spouse's Income* if Item 
Pll is not contained within any routing pattern, i.e. , it is to be m^wered 
by all respondents , 

The published response distribution for FllB (spouse's total 
income) lists 17,597 BLANKs and only 3 P 5I9 usable responses , for a 
usable response rate of 16,5 per cent. The large number of BLANKs is 
mainly a result of (1) the failure to condition response about spouse's in- 
come directly upon item F7A (What was your marital status, as of the 
first week of October 1973?) and (2) the format of the instruction And 
response for FllB, D, F, and H, 

From the standpoint of machine processing, all respondents 
were eligible to answer items FllB, D, F, and H. Thus, lack of tesponse 
-in the "spouse" column was automatically entered as BLANK rather than 
LEGITSKIP, 

In the absence of some instruction requiring a positive entry 
(such as, "Write NONE in the second column if you were not marfied in 
1973"), blanks there are hard to interpret. They could mean that there 
was no spouse, or that there was a spouse who had no income, o£ that 
there was a spouse with income but the respondent can't estimate It, or 
that there was a spouse with income which the respondent won't di>uige # 
The code category BLANK presumably includes all of these, 

Second, positive entries are also hard to interpret. The 
instruction says "if you are married, " implying M married at the tittle 

37.. 

o 

ERIC 



33 



you are filling out this questionnaire, ,r Some respondents may have been 
married or divorced during the several months between the first week 
of October 1973 (the reference week in item F7A) and the date they com- 
pleted the questionnaire, which ranges through February 1974, Entries 
of "spouse's 1973 income 11 from those married in the interim may have 
little or no bearing on their own education or work experience through 
late 1973, Those divorced in the interim (hence "not married now ") pre- 
sumably will have left the "spouse's income 11 column blank, even though 
their own education and work experience through late 1973 would have 
been affected by their (former) marital status . 

The number of BLANKs for this item is thus affected by a 
number of formatting and data-processing factors , It is patently absurd 
to suppose that all respondents were indeed eligible to answer about a 
spouse. Therefore, we have used item F7A, items F7B and C (date mar- 
ried), and Census information to make a crude estimate that perhaps 
4,050 respondents were married in 1973 and thus eligible to answer the 
question. The remainder (17,300) we treat as LEGITSKIFs, * On this 
basis, the number of BLANKs for FllB (spouse's total income) becomes 
only 531, and the revised usable response rate rises from 16,5 per cent 
to 86.9 per cent, 

We think that the usable response rates for some other items 



* 17,300 excludes the constant 1,048 general non -respondents . 

38 



34 



are affected in similar ways by the combination of instructions, response 
options, and automatic processing, although we did not attempt to revise 
the distributions for them. Our belief that usable response rates are 
sometimes understated lends a cheering note to this discussion, but the 
amount of work required to revise item FllB indicates that considerable 
effort will be required to reassess item eligibilities and recode cases 
from BLANK to LEGITSKIP. 

BLANK and DON'T KNOW . The frequency of "DON'T KNOW" 
responses (code 94) is astonishingly low for all of the items we examined, 
while that for BLANK (and PARTIAL RESPONSE) is persistently high. We 
think that many of the BLANKs are probably a way of expressing "don't 
know'* or related responses (e.g. , "can't recall"). We are not, of course, 
able to reassign BLANKs to "DON'T KNOW" as we were for some reassign 
ments to the LEGITSKIP category. 

Inspection of the First Follow-up Questionnaire (but not an 
actual item count) indicates that appropriate "uncertainty" response options 
are provided for no more than half-a-dozen questions, even though the 
questions often ask for details , time-remote events, or facts about others 
which are not likely to be well-known to the respondents. 

For future waves of the survey, we strongly recommend 
greater use of "uncertainty" options. This suggestion may be opposed 
on two grounds, (1) that it provides a loophole for denial of information 
and (2) that it merely substitutes one uninterpretable category for another. 



30 



3d 



We reject these objections because we think that "uncertainty" options 
will improve the amount of valid information (as distinct from technically 
valid data) obtained, 

Experience shows that respondents will often supply a "firm" 
or positive answer even when, in fact, they do not have well-grounded 
beliefs, facts, or attitudes. They do this simply to "help" a researcher 
or to give the appearance of being well-informed. When they are not 
shown* by an offered response option, that "don't know" or the like is 
a perfectly acceptable response, they may create a "fact" on the spot in 
order to satisfy seeming demands. Inclusion of a "don't know" option 
would reduce the number of such responses. Since BLANK (non -response) 
is already available as a way to deny information, the addition of appro- 
priate "uncertain" response options can only improve the validity of infor- 
mation , 

On the second point, "don't know" or like responses are in- 
terpretable, but BLANK is not, Analysis of such responses can reveal 
the extent to which ignorance, apathy, and fading memory influence deci- 
sions and acts, "Don't know" and other "uncertainty" response options 
thus can provide a great deal of useful information which may otherwise 
be buried in the BLANK category, 

Closing comment on BLANK , In a later section of this paper, 
we discuss methods available for estimating values to be assigned to 
BLANK responses, While some of those methods are rather sophisticated, 

40 : 



36 



none are as desirable as preventing the occurrence of BLANKS, As we 
have shown, the high frequency of BLANKs has been induced partly by 
artificial inflation, partly by inappropriate questionnaire design, and 
partly by absence of response options that might be used. At least as 
much effort should be given to preventing future BLANKs as will doubt- 
less be given to supplying missing data. Because overall response rates 
are likely to fall continuously over the duration of this (like any) longi- 
tudinal survey, we think that item non-response in the returns simply 
cannot be afforded from an analysis standpoint* Therefore, it is an 
urgent necessity that our (and others 1 ) suggestions be field tested and, if 
successful, implemented in future waves of the survey. 
Response Option: "Does not apply. tr 

The obverse of BLANK inflation is inflation of "usable" re- 
sponses. We think the response option "does not apply to me" inflates 
rates of usable response for many items, among them some that may 
be the origin of 20- and 60-routing-error codes on critical routing items* * 
The options "applies to me" and "does not apply to me" appear 
to have been used in lieu of more straightforward "YES" and "NO,' 1 



* Such as, F24 (Reasons for not continuing education after high school), 
F29B (Reasons for not continuing education right after high school), 
F48B (Why were you not working during the first week of October 1973), 
and F54B (Why were you not working during October 1973), Some other 
items which use "does not apply to me" are F31 (Reasons for changing 
schools), F35 (Reasons for changing academic field), and F38 (Reasons 
for dropping out of school). 



37 



probably to suit the wording of the questions involved. Our examination 
of response distributions leads us to think that fairly large numbers of 
respondents used "does not apply to me M as if it pertained to the whole 
issue raised by the question* For example, some who had not stopped 
their education after high school probably circled "does not apply to me" 
for at least a few of the reasons for stopping listed in F24, Their mean- 
ing of the response is that the whole matter of reasons for stopping does 
not apply to them; this is not, of course, the meaning intended by the 
questionnaire designer(s). 

The use of "does not apply to me" as a response option prob- 
ably has inflated the usable response rate for affected items . Perhaps 
worse* it would reduce the proportion of "applies to me" responses for 
any particular item so coded* thereby biasing conclusions about the impor- 
tance of the reason (or whatever), As we have mentioned, its most serious 
and pervasive effect may be the creation of many 20- and 60 -error -coded 
responses to routing items ( 

We think respondents should not be offered such an opportunity 
for error, and that analysts would be better served by the much less 
ambiguous YES and NO options, Questions can be worded to suit the 
options, rather than the converse, and we think they should be, 
Code 93--PARTIAL RESPONSE. 

This code is used when there is no response to an Item which 
is part of a set of related items (e.g. , reasons for dropping out of school) 



9 

ERLC 



42^ i 



38 



and at least one item in the set has been answered. From an analysis 
standpoint, it is chiefly a nuisance. Data processing runs must be pro- 
grammed to add the frequencies of PARTIAL RESPONSE and BLANK to 
tabulate the total non^-response for items so coded* 

The code is intended to describe the context in which the non-r;: 
response occurred, differentiating those who failed to answer any items 
in the set from those who answered at least one. The code may be helpful 
in analyzing such matters as routing errors, but to the substantive analyst 
it is merely an obstacle in data processing, 

The information contained in the code could be better con- 
veyed to the substantive analyst by addition of a general item code on the 
data tape. Such a variable would use three codes, designating (a) response 
present for all subitems, (b) response absent for all items, and (c) re- 
sponse present for only some items. 

Analysts interested in the context of non-response on a par- 
ticular subitem could then sort on the general item code as a first pro- 
cessing step. Others not interested in the context could avoid programming 
for the addition of PARTIAL RESPONSE and BLANK, We think the latter 
are likely to outnumber the former among prospective users. We suggest 
that for their convenience- -and to remove opportunity for error in calcu- 
lating response rates- -the special code 93 should be dropped, and those 
cases shifted to BLANK (98), Such a move would require creating the 
general item variable, as suggested, 

m 



39 



Uncodable Responses- -OUT OF RANGE (95) and MULTIPLE RESPONSE 
(96). 

We think these two codes could be eliminated and the cases 
merged in a single code category designated UNCODABLE. Neither 
existing code tells us anything more than that a response was given which 
did not fit the coding scheme (unless we are interested in why it doesn't 
fit).; We think few analysts will care why a response is uncodable, and 
for those who do* the unaltered RTI documentary tape would be available. 

Our purpose in recommending the merger and re-labeling of 
these two categories is simplification of data processing . There are few 
cases in either category (with the exception of an apparent data -processing 
mistake for item F41CA: Number of credit hours attained after high 
school, for which 19,947 OUT OF RANGE responses are listed), and 
ease and economy of data processing seems more important than retention 
of details on a handful of cases, ** 
Comments on Category Labels , 

In keeping with our insistence that opportunities for error be 
kept to an absolute minimum in the whole survey process, we think some 
response category labels should be altered because they misrepresent 

* Such as people concerned with possible modification of the NLS-HS 
questionnaire . 

** RTI's tabulation error points out that it is advisable to hold the num- 
ber of codes, hence the opportunities for error, to a minimum. 



40 



what is actually encoded* 

BLANK does not include all BLANKs; there are also PARTIAL 
RESPONSES and LEGITSKIPs . If PARTIAL RESPONSE were eliminated, 
as suggested above* the Important distinction between BLANK and LEGIT- 
SKIP would be preserved and better characterized by use of MISSING or 
MISSING DATA instead of BLANK, 

OUT OF RANGE does not include all out-of- range responses. 
Cases in that category are far better described by our recommended 
UNCODABLE, because OUT OF RANGE refers to cases which could not 
be processed by the ma chine - reading used to convert raw responses to - 
coded data * 

Truly "out of range 1 ' responses are listed by RTI in Table 
5 of the User's Manual (which Is itself mislabeled with the code 95 name 
although it has no bearing on that code category). Responses which fit 
the coding scheme but were deemed highly improbable are enumerated 
in Table 5, In the published distributions $ these cases are separated 
from the code 95 OUT OF RANGE responses , The latter are likewise 
not included in the tallies given in Table 5, 

Although the data tapes contain these "outliers 11 Just as re- 
ported, the response distributions published in the User's Manual do not 
clearly distinguish them as a separate category of dubious responses, 
and present an unneeded opportunity for error. We think that the label 
OUT OF LIMITS, as a separate response category, should be used in 

45 >,. 



41 



the published distributions. All such cases would then be reported in 
that category. User's Manual Table 5 (retltled) and its accompanying 
discussion would then link without ambiguity to a particular category in 
the distributions , If it seems desirable to distinguish them (and we think 
it unnecessary), two categories, BELOW LIMITS and ABOVE LIMITS, 
could be used. 

For most items, the category DON'T KNOW (code 94) is mis- 
leadingly labeled because it does not reflect an actual response option. 
For any item where "don't know" was not an available response option in 
the questionnaire, it offers a falsely precise statement of the number of 
people who "did not know. tf We are unclear how RTI determined that the 
respondent "didn't know" an answer for most items, although a few cases 
(possibly the result of key punching errors) are so listed for every item* 
We presume that most such cases were determined either by manual 
coding of write-in responses or from telephone call -backs. 

As we have suggested, much more than relabeling is needed 
to cope with this coding problem, and we refer back to the earlier dis- 
cussion for a proposed solution, 

SUMMARY 

We reiterate that RTI's coding scheme is, on the whole, well 
designed for the assessment of certain technical problems, such as item 
wording, instructions, and format, Since we suppose that the majority 
of data users will not engage in modifications of the questionnaire, and 

46 c - 



42 



might be confused or misled by features of the present scheme, we recom- 
mend that any parallel "general use" data tape employ a coding scheme 
modified along the lines we have suggested. 

In our opinion, a general- use data tape probably requires 
more judgments than RTI permitted Itself about the "true state'* of the 
respondent. We think that such judgments can be made on rational and 
mostly empirical grounds . Our recommendations are offered with a 
view to altering the coding scheme to incorporate such judgments and to 
eliminate codes made necessary by the restraints on judgment under 
which RTI worked, 

We stress those suggestions that bear on coding which affects 
the calculation of LEGITSKIP (the number of respondents not eligible to 
answer an item). It seems essential that the coding scheme not Introduce 
confusion about the size of the "eligible" pool, since any potential user 
should have a primary interest in the completeness and representative- 
ness of the available data for an item. 

Our suggestions do not modify the "true" situation with respect 
to item non-response- -that is, they do not supply missing data-- but they 
should make it easier for anr !ysts to assess the adequacy of the data base, 

This detailed discussion of problems associated with coding 
has preceded the discussion of patterns of item non-response to make the 
reader fully aware of the problematic nature of much of the information 
on which we have based the analysis of item non-response. No additional 
caveats are incorporated in the subsequent discussion, 

er|c a7 



43 



ITEM NON- RESPONSE 
Problg r^j iggociated with item non-response . 

As We have noted p item non-response poses several diffi- 
cultly fat the analyst. Besides biasing data In a manner which is difficult 
to co^^fctj it may sharply restrict the amount of usable data, An analyst 
who ty^fies to examine a number of items concurrently, or to relate re- 
sponse erom several waves of a longitudinal survey, may find this 
almost ysijog sable # 

For example, if he wishes to relate two items, each with a 
75 pe^f ^%flt usable response rate, it is quite possible that no more than 
50 pe^ ^%t*t of the respondents will have answered both items , If more 
ambitus efforts, say analysis of interaction among four or six items, 
are t§ he titidertaken, the overlaps may be so poor that the analysis will 
have iP abandoned or that some method must be employed to M plug f, 
gaps j(i fhe data base/ Unfortunately, the analyst cannot tell, from the 
response distributions published in the User's Manual, whether or not 
his p^p^^ssd study will be seriously impaired by such problems. Only 
when ^RWysis is underway (after the data tape has been acquired) will he 
be abj^ fQ Crosstabulate responses to determine how item non-response 
affect^ lits analysis plan. * 

* Thf^ i*Ulst occur when single-variable distributions are published. 
NQ)S$ might want to perform, for a fee, skeletal cross -tabulations 
whjCH would provide prospective users with joint-item response rates, 



48 



44 

The shape and variability of response distributions may differ 
greatly for items with different response rates. Because population esti- 
mates drawn from such sample distributions will vary in precision, mean- 
ingful comparisons maybe impossible. Certain kinds of indexes, such 
as ratios between average total Income and average unearned income, 
may be precluded. 

Biases resulting from general non-response can usually be 
suppressed by some scheme that applies across all items, such as re- 
weighting the sample. Corrections for item non-response cannot use 
such schemes, because the items are not independent and because too 
many separate weights would be required. Some other adjustment scheme 
must be used to preserve the utility of the data, 
Patterns of item non-response. 

Procedure , We have analyzed non-response by examining 
patterns of high and low rates as they appear in the data , Our main focus 
is on patterning associated with specific types of information, although 
we have also considered item sequencing and other format characteris- 
tics , 

We examined a subset of items selected because of their cen- 
tral importance for policy inquiries. These are listed, by number within 
response levels, in Table 2. * We chose those items that supply 

* They are also listed, by straight numerical sequence but without brief 
content synopses, in Table 3. 



45 



information about respondents 1 current (1973) activities and statuses 
(e.g. 5 marital status), school and training enrollment and costs, sources 
of funds for college or other schooling, work experience, income and 
sources of income, reasons for various acts and decisions, and certain 
personal and social characteristics (some acquired from about 4,500 
respondents only through the First Follow-up Questionnaire, Form B). * 
Counting all subitems (items which are part of a set of related items) as 
separate entities, we calculated response rates for 204 distributions 
published in the RTI User's Manual, The number of items omitted is 
relatively small. 

The items examined constitute the core data sources of the 
first follow-up survey. They cover all of the matters most likely to con- 
cern policy analysts charged with assessing the past or future roles of 
the Federal government in secondary and postsecondary education. They 
will be the source of information, for example, for such critical issue 
areas as the transition from school to work, access to and persistence 
in postsecondary education and training, and (from future survey waves) 
economic and other returns to education. 

Excepting item F64 (Since leaving high school, have you served 
in the Armed Forces, . * ,), which we excluded along with all other items 

* In the order mentioned, see for example items FlA-F, F7A, F21-F23, 
F25, F29A, F39, F46, F47, F48A, F54A, Fll, F24, F29B, F31, and 
F78-83. This list does not include every item examined. 



50 



46 



on military service, * we examined all of the ,f key M items listed hf rtI 
in User's Manual Appendix B, 

ETI used key items to assess the acceptability of a question- 
naire* If answers to any were absent! the respondent was contacted by 
telephone to resolve problems; if answers to all the key items ware 
present* the questionnaire was accepted and appropriately coded Md 
punched. 

An important digression is in order here, RTI's key items 
omit some which we think crucial, particularly items 46 and 47 (school 
finances), item Fll (1973 income), and all of the "'background character" 
istics," except race, in items 86 through 99 of Form B--the only Source 
of direct information on these matters for about 4,500 respondents* we 
think the "key item" check list should have been more comprehensive, 
and that the key items in future waves should emphasize information 
crucial to Federal policy concerns, Because educational financing and 
equal access are central to Federal policies on higher education, We ur ge 
that "key items " for future waves of the survey include pertinent it^ms 
such as those we have mentioned. 

To return to our procedure: 

We focused on usable response rates, by which is meant the 
proportion of responses, out of all ruled eligible to answer a given item, 



* Because very few respondents had been involved in the military, 



47 



that can be interpreted without ambiguity. This definition implicitly 
Incorporates as "non-response" all cases in which an expected answer 
was omitted and those for which an answer that was given cannot be used 
in analyses* i,e., cases bearing "garbage** codes and those bearing 
routing-error codes, 

Cases with routing-error codes were excluded firom usable 
response because they indicate doubt about the accuracy of routing-item 
responses, Had comparable codes been used to "flag" suspect responses 
to conditional items— as we have argued they should have been—we would 
also have excluded those responses t Under the present circumstances, 
however, this was a practical impossibility. Therefore, the calculated 
"usable" response rates for conditional items include some responses 
whose accuracy is in doubt. 

We recognize that the inclusion of dubious responses to con- 
ditional items inflates the calculated usable response rates. Since we— 
like any prospective user relying on the published distributions --were 
unable to distinguish the suspect conditional responses from the others, 
we were forced to assume that all items were about equally affected by 
them. While we doubt that this is empirically true, we made the assump- 
tion in order to carry out the analysis of response patterning. 

For our analysis, we graphed the usable response rate in 
a strip graph with items ordered according to their numerical sequence 
in the questionnaire. Some of our comments follow from inspection of 

5'2 



48 



this graph. Their grounding may not be obvious from the same data 
presented in other formats, but the graph is omitted from this paper be- 
cause it is unwieldy. An interested reader can reproduce the graph in 
strip form from the data given in Table 3 (appended). 

Statements based on our analysis of response patterns are 
to be regarded as diagnostic hypotheses, or plausible explanations based 
on conjecture. We emphasize that recommendations based upon these 
hypotheses need field testing before they are implemented for collection 
of basic NLS data. 

Varieties of patterning. 

Instrument length and format . Decreasing rates of usable 
item response, referred to here as attenuation, are to be expected in 
any self-administered questionnaire, as respondents tend to tire of answer 
ing questions. Some will simply quit, at some point, if there is no ex-* 
ternal Impetus to complete. The complexity of the routing patterns in 
the NLS-HS instrument makes assessment of attenuation difficult, since 
there are several possible beginning-end sequences. 

It is possible, however, to compare response rates for 
items at the beginning of questionnaire Section A with those for the back- 
ground-data items in Section E, both of which all respondents were 
supposed to answer, The highest response rates in the initial portion 
of the questionnaire run in the vicinity of 95 per cent, whereas those 



53 



49 



for background data items in Section E run about 90 per cent, * This 
suggests that attenuation attributable to the length of the questionnaire 
may be no more than 5 per cent. This is relatively small for such a 
long and complex self-administered questionnaire, and leads us to 
wonder whether respondents answered questions in the sequence they 
were asked, 

The sheer length of the NLS instrument seems to exert sur- 
prisingly little effect on item non-response. Therefore, we attribute 
most of it to the questionnaire's complexity, especially its complex rout- 
ing patterns . 

One possible reasons for routing-pattern errors may be that 
some respondents answered the questionnaire from back to front or by 
skipping around among sections in the instrument. There was nothing 
to prevent this pattern of response (as there is in a personal interview). 
Since we may presume most respondents knew the most efficient mode 
of multiple -choice test-taking (leaving the hardest questions till the end), 
it is reasonable to suppose that many followed the same practice in com- 
pleting the questionnaire. No amount of care in the construction of rout- 
ing patterns, so long as they assume Item 1 -to- Item N sequencing of 
answers, will wholly eliminate skip pattern errors and their attendant 

* For Section A, we include items 1A, 4, 5, 6A, 6B, 9, 10, 12, 16A, 
For Section E, items 78A, 78B, 79, 80A, SOB, 80C, 81. The mean 
usable response rate for Section A items is 93,7; for Section E, 90*1* 



54 > 



50 



loss of usable response. Some formats! however, may be less vulner- 
able than others to respondents* carelessness, 

One alternative approach to routing would make use of 
special blocking! variations in type style, physical alignment of response 
options, and the like to direct attention toward the proper item sequence, 
Figure 1 illustrates! for one troublesome sequence, how physical layout 
changes might be employed to reduce skip errors, 

The layout of the First Follow-up instrument was evidently 
designed to facilitate editing, coding, and other processing. This purpose 
is not necessarily compatible with that of directing the user (respondent) 
through the proper Item sequences. Figure 1 is laid out for the conven- 
ience of the user, and will probably prove less convenient for direct 
keypunching, since items and response options are Tf j umbled" from the 
standpoint of keypunching. 

We may note that use of the form of layout suggested in Figure 
1 would probably require printing the instrument across the eleven-inch 
axis of each page. We think this not only necessary but highly advanta- 
geous, since in that format the most convenient way to read the question- 
naire is probably from page one to page N , (The inconvenience of other 
sequences can be checked by attempting to read any booklet containing 
tables printed across the long axis* It is not impossible to read "out of 
order, ff but troublesome,) 

The layout format of Figure 1 attacks two sources of skip 



•B. My schooliBg um high school? 



Yes__ m 

25, In school 10/737 



Yes 



1 



26A , What school? 



NO 
\ 

GO DIRECTLY TO 
Q,29A,pJ|> 

Do not answer any 
other questions on 
this page. 



State: 



ML ..J 

f 

24, Reasons why m t 155 N? 

I 2 

k 1 2 

1 7 

C j i » i i * I -' I * i * * • I * 1 i i i ■ i ■ 1 ' ' I ■ 1 ' 1 1 1 4 1 1 1 ^ 

j i 2 

CL j i ■ ! * * * i i i i i * * 1 , i i • i » * ' 111*1 ^ ■ 

f 12 

LlHiMlH'",,!,!!",,!,,**!^ 4 

i 9 

g, i * ■ • , ■ * . i .M»'M.,H f ll HIHI I 

t 1 2 

1 7 
1 2 

] M * ' ' i i I M I ' f I I • ' p(liifii A 



GO DIRECTLY TO 
Sec, C, p. 15 



Do not answer any other 
questions on this pap, 



26B. What kind school? _ _ 

I voc, or trade B. Two-year college C. Four-year college o r university D. Other 

26C. Public or private? 



Private 



Fig. 1, -Possible layout for routing pattern, items F23 to F26 (achernatic), 



52 



error. It discourages random entry, owing to clumsiness imposed by 
the 90-degree rotation. It also makes clear that different choices of 
sequence flow from particular responses to the routing items: alternative 
sequences are placed side-by- side, to indicate either- or decisions, much 
as a fork in the road requires an exclusive choice of path. Lines drawn 
around exit blocks serve to set the pathways apart visually, and the in- 
struction GO DIRECTLY TO* instead of SKIP, directs attention to the 
item destination rather than to the pathway excluded. 

Use of different type styles for routitig^iteni responses like- 
wise alerts the reader that something is different about the alternative 
choices. The arrows (which flow directly from the response word-- 
rather than code --to the appropriate item number or direction) should 
serve to provide ,f closure" for the implicit question ,T why are the type 
styles different?" Use of red, stylized arrowheads "serves further to 
direct attention to the fact that the respondent is expected to go some- 
where other than down the page to the next item, and the juxtaposed item 
destination tells where. 

Physical layout devices like these would, we think, serve to 
make the questionnaire better suited to self-administration than is the 
original layout. 

Still another possible approach would be to adopt a '"tax-return' 
format » where whole series of conditional items would be placed on sep- 
arate sheets or ''schedules, M 

——————— 5B 

m Shown as D&> in Fig* 1. 



53 



With this approach, those items which every respondent is 
expected to answer would be consolidated in a basic booklet. Depending 
on his response to each screening (routing) item in this basic form! the 
respondent would then be instructed to complete specified schedules. 
Each supplementary schedule would contain only chose items appropriate 
to the screening response, and would tear a prominent instruction to 
"complete this schedule only if you said to Question X, ,T General 
instructions for the whole instrument would emphasize that the respondent 
will not need to complete every schedule enclosed and that he should pay 
careful attention to the instruction on each schedule. 

This format has the disadvantage that the respondent may 
lose, or choose to ignore, whole blocks of items, it may risk greater 
information loss than more ordinary "routed" sequences * hut "routed" 
formats do not prevent the respondent from skipping whole blocks, so the 
new difference in risk may be small. 

The "tax form" format seems likely to offer several advan- 
tages. First, it should reduce errors caused hy random entry, Second, 
it may encourage overall response by demonstrating that not every ques- 
tion will have to be answered . Third, should there be missing information 
it would be possible to mail ou 1 * only the appropriate schedule, rather 
than a whole new questionnaire, for follow-up. * Fourth, hy appropriate 



* With an appropriate cover letter and special Instructions. 

59 



54 



omission of schedules , It would be possible to "tailor" instruments to 
the experience of the respondent as known from earlier waves of the 
questionnaire, perhaps thereby inducing a higher general response rate 
as the longitudinal survey progresses, * Finally, it would be possible 
to request some detailed information only from subsamples, since 
schedules requesting details could be sent to only (say) 20 per cent of 
all respondents. This, too f might enhance general response rates , 

As noted * we suspect part of the routing errors, to which we 
attribute much of the item non-response, occur because respondents do 
not proceed through the questionnaire from the first to the last question , 
If so, it is imperative to use sortie format that, as nearly as possible, 
eliminates the dependence of routing on following a particular item se- 
quence or (failing that) to make the correct sequence so obvious that it is 
hard not to follow it, Our formatting ideas attempt to achieve this* 

Attenuation within blocks , A second major pattern involves 
response attenuation within sets or blocks of items . There appear to be 
several different sources for this. 

The first seems to depend on the probability that an offered 
response option applies, Items PI and F16* * illustrate that matter as 

* This notion is predicated on the assumption that response is more 
likely if some evidence is given that previous information is actually 
being used in some way. An instrument "package" that shows both 
concern for the respondent's time and awareness of previous responses 
should help in this regard, 

** Present (1973) activity and anticipated (1974) activity, respectively, 

60 



55 



shown in Figure 2, The relationship shown there is imperfect, but it 
seems evident that response rates drop as the probability (percentage 
"applies to me M ) of the activity decreases, In these examples, response 
above some "base" rate (here, about 85 per cent for specified activities) 
depends on the proportion of people to whom the item applies * 

Item F47 (sources of funds for schooling) provides a more 
dramatic example . In the published distributions, it is treated as seven 
pairs of sub items with a fund source and amount as each pair* Usable 
response rates range from about 65 per cent for the "first-listed source'* 
to near zero per cent fox the "seventh source* " Obviously, the explana- 
tion is that the number of people having seven separate sources of funds 
(that they can report) is far fewer than the number having one* The drop- 
off in response between the "first" and ''seventh" sources forms a nearly 
uniform rate of attenuation within the block, 

Attenuation of this form can probably be attributed to respon- 
dents* reliance on non-response to express ,f not true in my case" or some 
similar meaning. We have remarked earlier on the need for greater use 
of "don't know" and "doesn't apply" response options to reduce non- 
response of this type. In the case of item Fl, however* "does not apply 
to me" is offered, and the effect still occurs . Perhaps a simple YES and 
NO option pair would have been better, 

A modified form of the same pattern appears for three sets 
of items asking reasons for decisions about posts econdary schooling 



61 



Fl; Present F16: Anticipated 



ACTIVITY 


"Applies"* 1 


Response Rate 


— = = — rr - 

i i i 1 i tin 

"Applies" 4 


Response Rate 


Working 


64.31 


94.6% 


73.61 


92,8% 


Academic Course 


41,5 ' 


90,0 


48,2 


88,9 


Homemaker 


16,0 


86,2 


21,4 


84.8 


Vocational Course 


14,2 


87.3 


20.0 


85,2 


Military 


5,4 


86,6 


6.2 


84,4 


"Other" 


5.7 


65,7 


6.0 


63,5 • 


g 

Response "applies 


to me" or "expect to be £ 


Joing" as percentage c 


)f usable responses (e 


ach subitem). 



Fig. 2, -Correspondence between Activity Probabilities and Item Response Rate, 



ERIC 



57 



(items F24, F29B, and F31). Although the response rates are quite uni- 
form for the several subitems (individual reasons) within each set, the 
average rate drops by about 25 per cent between F%4 and F29B and by a 
like amount between P29B and i?3i« Each successive item (set) applies to 
a smaller proportion of the sample; given inflation of eligible pools, the 
constant decline in response rates probably reflects the noted tendency 
to use non-response a$ a way Of expressing "doesn't apply, " 

Items F24 and F29B probably seemed identical to some re- 
spondents . Item I '24 requests reasons for not continuing formal education 
"after leaving high school. " anQ F2i9B is identically worded except that 
it stipulates "right after leaving high school" (emphasis added), We think 
it likely that some of the decline id average response (to the set of reasons) 
stemmed from this seeming duplication. We consider "seeming redun- 
dancy" a second source of within-Mock attenuation. 

Both the "probability" and the "redundancy" effects are allied 
to within-block attenuation bas^d on increasing detail. Consider, for 
example, item Fll, requesting information about the respondent's and 
his/her spouse's 1973 income. For some, TOTAL INCOME and INCOME 
FROM WAGES, SALARIES, (etc.) may seem redundant. For most, in- 
come from "other" sources listed- -interest, rental property, public 
assistance, unemployment compensation- -will not apply. 

Response rates drop sharply, for both respondent and spouse, 
over the several subitems in PU, Both improbability and seeming 

64 



58 



redundancy probably contribute to response attenuation in this case. * 

Even when "probability" and "redundancy" are less important 
influences, requests for details are resisted by non-response. For ex- 
ample, there are several instances where usable response rate for routing 
items (with error-coded cases excluded from usable response) are sub- 
stantially higher than the rates for "detail" conditional items (with some 
"error" responses included as usable), ** All of the critical routing 
items exhibit this pattern, as shown in Table 3. There, usable response 
rates fall between items : 

• F23 and F24 (any postsecondary schooling and reasons 
why not), 

• F25 and_F26-F27 (enrolled as of 10/73 and various 
details about the school in which enrolled), 

• F29A and F29B (enrolled 10/72 and reasons why not), 

• F30 and F31 (same school in 10/72 as in 10/73 and 
reasons for changing), 

• F48A and F48B-F50 (any job in 10/73 and details about 
why not or what kind of job, wages, hours), and 



* Similar influences probably operate for items F82C - F82DC and F83C 
- F83DC (application for financial aid and amounts received, first- 
and second -choice colleges), 

** This pattern may be an artifact of the inclusion of the 40-coded respon- 
dents in the eligible pool for conditional items. Data in Table 1 suggest, 
however, that the drop in response rates is generally too large to be 
accounted for by the inclusion of the 40-coded respondents. In addition, 
their influence on the conditional items is offset to a degree by inclusion 
of the 20- and 60-coded respondents, who contribute to "usable" re- 
sponse (excepting PARTIAL RESPONSE codes). 

65 



59 



* F54A and F54B-F36 (any job 10/72 and deta r about 
why not or type of job, wages, hours). 

Inspection of the response rates for these examples suggests, 
not surprisingly, that respondents simply resist answering questions that 
ask for a lot of detail , 

Since much of the item non-response occurs within blocks, 
our examination leads us to the quite conventional conclusion that item 
non-response for the instrument as a whole would be reduced if fewer 
details were requested and if unlikely events were excluded, In short, 
a simpler and more generally "relevant" set of questions would reduce the 
problem of item non -response. 

However, elimination of detail might defeat the purpose of 
data collection. If details cannot be deleted and the purpose still met, it 
may be useful to modify the ordering of requests for detail . The patterns 
described above suggest that if unlikely options and /or "fine line" detail 
were requested first in a sequence, item response might be somewhat 
enhanced . 

It will not always be possible to follow that format, In some 
cases, the biggest drop in response rates occurs between a routing item 
and a set of conditional questions; there, the order certainly cannot be 
reversed. If, however, conditional items were to appear in a separate 
''schedule, ,f as suggested above, some modification in the ordering of 
the details might be possible. For example, the longer time lapse 

66 



60 



between response to the routing item and completion of the details might 
give the respondent a long enough pause to permit him to search his mem- 
ory or his records for the requested information, without interrupting an 
otherwise smooth task flow, * 

Where non-response is based on a respondent's lack of infor- 
mation, no formatting tricks are likely to affect non-response appreciably 
But a combination of easy-to-follow format, reduction of vulnerability to 
random ^oints of entry, addition of "don't know" (or similar) options 
among the precoded responses, elimination of non-essential detail, and 
exclusion of options that are likely to apply to very few respondents (or, 
better use of "does not apply" as an option) should serve to enhance item 
(and possibly general) response. 

We again emphasize that these suggestions are not to be im- 
plemented without careful field tests, A field test of various alternatives, 
using identical questions but varying the questionnaire format, should 
indicate whether the suggestions will work to increase item response, 

We remarked in the opening pages that changes in formatting 
and response options might cause noncomparabiiity between waves of the 
survey, From a "purist" standpoint, this will certainly be true. From 
a practical standpoint, some changes may be so minor as to cause little 



* Instructions might call for completion of detail schedules after com- 
pletion of the basic form. By thus controlling the task flow, interrup- 
tions might be "scheduled" remotely » 



67 



61 



concern, but others maybe very influential. 

We have suggested format changes because we think future 
surveys will yield very little useful information without them, The com- 
bination of sample attrition and item non-response rates as high as many 
in the first follow-up could so far reduce ihe available data for some items 
as to make generalizations to the original population impossible. 

It may be necessary to choose between data with doubtful 
comparability and no useful data. Given the resources already expended 
on the NLS-HS project, the wiser course seems to be acceptance of some 
non- comparability* Future instruments must be designed not only to 
remedy the problems we have discussed but also to minimize non-com- 
parability resulting from changes. Whether these objectives are compatible 
can only be determined from experience. Hence, our repeated emphasis 
on field testing of any instrument modifications. 

Kind of informatio n requested , Sources of non -response 
identified above are all based on quantitative considerations --too many 
questions in general, too many details, too few people to whom a response 
option may apply, too few response options, and so forth, 

We turn now to patterns which appear to be based on quali- 
tative considerations, i.e. , items which seek information of certain 
kinds which respondents are unwilling or unable to provide, 

In Table 2, items are grouped by usable response rate; the 
content of low-response items differs fairly systematically from that of 

68 



82 



the high-response items, which allows us to identify certain kinds of In- 
formation that are especially likely to be omitted. 

First among these is the familiar "money" item. Item non- 
response tends to be high in any survey not conducted by personal inter- 
view* when a request is made for information about Incomes, expenditures , 
savings, and the like. Presumably, this is grounded in part on a cultural 
prohibition against divulging this kind of information and in part on ignor- 
ance of details about personal or family finances . Since norms about the 
propriety of disclosing financial information may vary among subcultures, 
item non -response involving this kind of information is not easily prevented 
by any one stratagem. 

Aa a rule, respondents are more likely to comply with requests 
for information made personally by an Interviewer. Consequently s it is 
common (though not always effective) survey practice to use telephone or 
personal call-backs to obtain omitted financial information when the cost 
of doing so is justified by the expected benefits of the research. We com- 
mented above that RTI's list of "key items" for the manual edit excluded 
certain critical financial items for which non-response should have 
triggered call -backs, and urged that they not be omitted in the future, 
Resistance based .on taboos against financial disclosure m 
only one source of financial omissions. Lack of information, or simple 

*And often in personal-interview surveys as well. 

v60 



63 



laziness, is probably at least as important and often more easily dealt 
with, It is sometimes possible to lead respondents, by small increments, 
into full disclosure, This generally entails a series of very simply worded 
questions about a long list of alternative sources. In addition* it helps 
to ask for answers in terms of rather broadly categorized response options 
rather than asking for a specific number, unaccompanied by suggestions, 

The use of supplied, categorized response options will tend 
to reduce accuracy for those respondents who know quite well, and are 
willing to report, amounts of money, Hence, it may be necessary to 
choose between accurate estimates from vexy few respondents and rough 
estimates for a greater number of respondents* Since the accuracy of 
financial detail is generally not high In survey studies, it is probably bet- 
ter to provide * ! pegged tr response options and accept their inherent inac- 
curacies . 

Depending on prior knowledges it is often possible to estab- 
lish well -delimited options within a known range. For example, students 
in the NLS might be supplied options in the range "zero" to "over $2,300" 
with choices at $500 increments, and given an instruction phrased roughly 
as: "Mark the amount which comes closest to the amount of your scholar- 
ship Income during 197-. If you had no scholarship, mark 'does not apply/ 
If your scholarship was less than $250, mark zero; if it was more than 
$2,750, mark ? over $2, 500' , f¥ Such detailed and simply-worded instruc- 
tions extend the space requirements of the item, but may increase Usable 

: 70 



64 



response. 

Supplied options and detailed Instructions may be expected to 
aid recall and to relieve the respondent from the burden of giving an im- 
possibly precise answer, A "don't know" option should also be provided, 
with instruction to use it only when there was income from the source but 
the amount cannot be recalled well enough to use the ranges. * 

It might also be possible to Increase item response by clus- 
tering all financial questions in one block. The advantages of this approach 
are that the respondent can focus his attention on a single type of informa- 
tion, can take time to collect records, and can use, one item to aid his 
memory on another. This last, however, may be an important disadvan- 
tage, because interdependence among several items may transmit errors 
throughout the entire block aid because there may be erroneous transfer- 
ence among the items (halo effect) . Another disadvantage is that a 
respondent may omit all financial information when thus blocked, whereas 
he might supply at least some part when items are separated . Clearly, 
such a scheme requires empirical field testing before being adopted for 
actual data collection, 

Non-response on financial items is doubtless the most serious 
concern (because the data are essential and potential remedies are few 
and may ititroduce new distortions), but the second prominent type of 

* The use of "income" in these passages is for illustration; the same 
stratagem may be useful for other financial data, 

71- f 



65 



information Lost presents almost equal difficulties . In this category are 
items which request information about the respondent's reasons for doing 
or not doing something, ^ — -~-~™~ 

As we have said, we suspect part of the high item non-response 
here either stems from respondents' use of omission to express "no" 
(or, "does not apply") or is an artifact of the routing-error or other codes 
applied, la part, as well, it may be that when questions appear redundant, 
as discussed abov^e for the case of F24 and F29B| respondents believe they 
have already answered the question and see no need to repeat thenu^ives. 

Response rates for ''reasons 0 blocks are lower than 80 per 
cent, and more typically lower than 60 per cent, for every such item we 
examined* neither of which is true for all financial information, We doufct 
that these low rates result wholly from the questionnaire design, As a 
hypothesis, we suggest instead that the explanation may lie partly la the 
fact that the sample consists of people leaving adolescence and entering 
adulthood* This may be imp ortant because at that time personal autonomy 
is a strong motive, and protection of newly-won or sought-after adult 
rights Is a major consideration, Acts and decisions may be made without 
strong reasons other than the assertion of claims to adult status and, 
once made, must be defended against adult criticisms. Under such con- 
ditions, requests for "reasons why you did X ?f might be viewed as a call 
for justification, to which the emerging adult may respond negatively. 

For "reasons" items, as for all others in the questionnaire, 

72 ~ 



66 



it must be remembered that the wording of the items carries the load of 
"rapport" which, in a personal interview, is carried by the interviewer's 
manner and expression. The questions must convince the respondent that 
the information is really necessary, explain In very simple terms just 
what to do for any conceivable respondent situation , and express recog- 
nition that the respondent is doing the researcher a favor by taking time 
to answer the Item. 

Fording of iterps is, therefore, more crucial for a mailout 
questionnaire than it is when personal interviews are to be conducted. 
The researcher must depend on "cold" written language to perform all 
the persuasion and appreciation -giving; tanks winch are otherwise the 
responsibility of an interviewer . To accomplish this , it may be neces- 
sary to use longer, more emotive items than would be used with personal 
interviews . The need for such language may be especially great for the 
NLS-HS sample, for reasons like those noted, 

For " reasons" items, more deferential question wording 
might enhance response. The present item F24 ("Here are some reasons 
others have given for NOT continuing their formal education after leav- 
ing high school. Which of these reasons, if any, apply to you?") offers 
a set of reasons acceptable to adults and demands thai the respondent 
claim one of these as his own. It might be better to put the question in 
language suggesting belief in the autonomy, uniqueness, and privacy 
rights of the respondent. Perhaps something like: "There are many 



67 



possible reasons why a person does not continue formal education after 
high school, Your reasons may be among those listed below. Please 
circle YES for those that apply in your case* and circle NO for those 
that definitely do not , If your reasons are not listed, circle NO for all 
those listed* circle YES for "OTHER," and give a brief description of 
your reasons in the space provided. 11 While such wording makes the 
item longer, it uses simple and deferential phrasing, gives an instruction 
for what to do in any case, and, by the inclusion of a free-response 
choice, allows the respondent to maintain his belief in his own unique- 
ness even though the chances are that he will choose one or more of the 
listed reasons , 

It may he that high non- response to items asking for reasons 
chiefly reflects the p os t -adoles c ent status of these respondents, If this 
is correct, response rates on "reasons" items should rise over time, 
simply as a function of the respondents" growing self -assurance and be- 
lief in the security of their adult status . 

A third kind of information commonly omitted is that which 
is not likely 10 be readily recalled by the respondent at the time he com- 
pletes the questionnaire* In Table 2, It can be seen that response rates 
tend to be low for items seeking information about time-remote events 
(either past or future), details, or about other people or organizations 
external to the personal experience of the respondent. Individuals in the 
age group to which the survey is addressed are typically in a period of 

o 74 
ERIC 



68 



transition between roles and social statuses, a time when concerns about 
the present and about oneself may be most important. Under such circum- 
stances, requests for information about the past or future, about others, 
and about details demand facts that may not be "up front" in the respondent's 
memory and which he may even think trivial, If such information Is essen- 
tial j members of this sample will probably have to be lead into respond- 
ing by aids to recall and deferential encouragement . 
Summary . 

A critique drawn from hindsight is, of course,, easier to pro- 
duce than a foolproof instrument designed before benefit of field experience 
with the specific sample in use. Our comments are not intended to deni- 
grate either the Instrument or its designers , and we urge that they not be 
so taken. Rather, we hope that the experience gained from the first 
follow-up can serve as a basis for improving item response in the future. 

Our strongest recommendation is that future questionnaires 

be designed with greater consideration for the intended "audience" and the 

mode of data collection . The instrument must be made to do the guiding 

and "rapport building" work of an interviewer. The weakness of the 

instrument for use with self- administration Is well documented by RTFs 

comment about manual edit failures: 

Approximately one-third of all mail -returned questionnaires 
failed manual pre -machine edit and required telephone follow- 
up to some degree; less than 5 per cent of all questionnaires 
resulting from individual interviews by Bureau of Census per- 
sonnel failed this edit. (Usecis Manual, p. 21) 



69 



It seems probable, given usable response rates ranging from 
the low 70s to low 40s for various parts of items F46 and F47 (first year 
school costs, fund sources, and amounts) that manual edit experience with 
the mailed returns would have been still worse had these items been among 
the "key" items - 

Even with the manual edit and the call -back procedure, un- 
ambiguous usable responses to item F23 (any postsecondary schooling or 
training) were a woefully small proportion (87.5 per cent) of the answers 
to this critical item, One wonders about the adequacy of telephone follow^ 
ups which permit about 12 per cent ' 'error - coded " responses to perhaps 
the most important routing Item in the entire questionnaire, ETI says: 

Questionnaires which failed the manual edit process, due to 
having Insufficient information on the "key" questions, were 
examined carefully by telephone follow-up staff in prepara- 
tion for a telephone interview with the respondent* Telephone 
follow-up operators were trained . . . so that they would be 
capable of coping with any questionnaire-related questions a 
respondent might bring up. (User's Manual, p, 21) 

Evidently, either the questionnaire was so complicated that the trained 
operators couldn't prevent 12 per cent routing-pattern errors for item 
F23, or else their level of performance was quite low. Since the tele- 
phone call-backs focused on rather few items, we think it Is probably 
the questionnaire's complexity (rather than the operators' performance) 
that accounts for the experience with F23. 

Our second urgent recommendation is revision of the response 
options and coding categories, to encourage use of "don't know" or "not 

76 



70 



applicable ,f instead of item omission and to simplify data processing 
and analysis. We hare discussed this in such detail that no further com- 
ment is needed here . 

Third, to combat routing -pattern errors f we have suggested 
that the physical format of the questionnaire be changed, We urge that 
it be assumed respondents may answer questions in any order of their 
choice and may not complete the questionnaire at one sitting* These 
assumptions require that routing be made, as nearly as possible* inde- 
pendent of item sequence. Our formatting suggestions rest on this re- 
quirement. 

Fourth, we have suggested that much of the information sought 
maybe deemed trivial by the respondents, or be outside the scope of 
their everyday memory. Admonitions to "think carefully 1 ' probably will 
not suffice, especially in the context of a long and rather detailed question- 
naire. They must be replaced or supplemented by language and format 
which leads the respondent into the areas about which information is 
desired and which permits him to state that he just doesn't know, with 
no implicit stigmatization for making that response, 

In addition to these suggestions, we think there should be a 
frank admission (in the general appeal prefacing the instrument) that the 
information sought may seem trivial to the respondent but is nonetheless 
important to policy makers. This should serve to prevent some item non- 
response, 

77 



71 



Finally, w<s urge that no information be sought which is not 
In fact essential for policy- making purposes. We presume that all of the 
Items In the first follow-up qu^stiDotmire were screened and justified on 
such grounds. Nevertheless, iti 3 period when general resistance to 
survey research and to M govern^aW prying" are prominent, extra pre- 
cautions must be taken to assure thai 110 unessential questions are asked. 



78 



72 



ASSIGNMENT OF DATA TO ADJUST FOR ITEM NON -RESPONSE: 
ISSUES, METHODS, AND EXPERIMENTATION 

Introduction , 

F 

Our review of several approaches to assigning data as a means 
of compensating for item non-response reveals little consistency of prac- 
tice and a paucity of information pertaining to the matter, A search of 
listings in a key Census publication yielded no articles relevant to our 
present discussion, * Both the inconsistency of practice and the dearth 
of literature suggest that the problem has not been given the methodological 
attention it deserves, ** ■ 

The discussion following covers a wide range of topics, We 
begin with some general concerns about the wisdom of making data assign- 
ments j continue with critiques of several approaches to the problem of 
item non-response, and end with a discussion of methodological matters, 
including a suggestion for isolating appropriate values which might be used 
when assignment is essentiaL 

We argue below that data assignment is to be preferred over 

* Bureau of the Census, indexes to Survey Methodology Literature, 
Technical Paper No* 34, Washington: GPQ, 1974, Certain recent 
articles on the "hot deck" and other procedures were brought to our 
attention, but we have excluded the methods they discuss for reasons 
stated In the main text, 

** Blau and Duncan (1967), summarized below, provide an excellent dis- 
cussion of influences of item non-response on correlations between 
one pair of variables , 



7% 



73 



deleting respondents with missing data because it allows the analyst to 
keep the sample representative. To the argument that assigned values 
should not be used for longitudinal analyses, we reply that a carefully 
chosen assigned value is better than none at all if the item non- response 
is high. 

We offer two major recommendations. First, we suggest 
NCES undertake a series of empirical investigations aimed at assessing 
the effects of various methods of data assignment upon fundamental 
characteristics of the data, such as the shape of distributions, variabil- 
ity, measures of central tendency, measures of change, and measures 
of intertemporal correlation, Unless such investigations are conducted, 
the effects of any method can only be assessed speculatively. We think 
the long-range utility of the NLS-HS data base justifies the time and ex- 
pense of such investigations, 

Second, we suggest that NCES use the information from such 
studies as the basis for establishing its own policies for in -house analysis, 
and for preparing a manual on data assignment to be circulated among • 
prospective users of the NLS-HS data. It seems to us that the best way 
to please all prospective analysts is the publication of a manual, to be 
used by the analyst in making his own decisions about data assignment, 

NCES should also consider the merits of including assigned 
data in the parallel analysis tape suggested in the previous sections of 
this paper. We think that prospective users might be given an option 

8tf T 



74 



between data files with and without assigned data. 

B : :^>^ : an overview « 

There are two issues which must be resolved: (1) whether 

to attempt assigning values and (2) how such values may be assigned. 

The decision as to whether imputations should be made is not entirely 

a technical matter, and probably should be left to each research user 

of the NLS-HS data, who can consider his own research objectives and 

the consequences of data assignment for them. 

The chief advantage of missing-value assignment is that it 

helps maintain the size of the data base available for analysis and the 
representativeness of the sample, The chief disadvantage is that the 
enhanced data base may lull the unwary into feeling that the data are 
more complete or precise than in fact they are and f therefore, placing 
too much confidence in the results of analysis, In general, where static 
description of a popule^ >i vie goal v judicious assignment of missing 
values may improve estimates of distributions , Assignments may be- 
come dangerous, however, when the objective is estimation of sequential 
or causal connections between events or states (e.g., analysis of dynamic 
processes). 

The most common approach to missing-data adjustment in- 
volves assigning some category (or, more accurately, subcategory) 

mean or median value to an individual missing case* * The hazard 

_ * i 

* Of those examples we review below, only the approach taken by the 
Bureau of the Census departs from this practice; there the objective 
is to, duplicate a known distribution rather than to establish a data 
q base "for examining ft causar f relationships. Q i 

ERLC 



75 



involved in assigning subgroup means or medians to individual cases, 
for longitudinal analysis, is that such a practice will certainly reduce 
variability within any subgroup of cases containing assigned values and 
is likely to reduce correlations of sequenced events (states), 

A more subtle problem involves compounding of assignments: 
an assigned value might become part of the basis for assigning another 
value to an individual . In the event that analyses involve examining rela- 
tionships between two variables, one could end by relating a variable to 
itself (i.e. , to a composite heavily loaded on the original assigned value). 
Such a feat would, clearly, exaggerate estimated relationships in propor- 
tion to the number of dual-assignment cases* included in the analysis 
and the extent to which the first assigned value determines the second. 

On the other hand, failure to assign missing values may also 
distort analysis, If item non-response is systematic, both descriptions 
of the population and estimates of correlation may be misleading, For 
example, suppose that people with high levels of educational attainment 
but relative low incomes, omit their Incomes (perhaps to "save face")- 
Suppose further that people with little education, but relatively high in- 
comes (e.g., from illegal sources), likewise omit their incomes, Given 
such a systematic pattern of item non-response, the correlation between 



* That is, the number of cases for which "both" variables carry assigned 
values . q o 



76 



ERIC 



education and iy-ome would be artificially inflated if no estimates of in- 
come wer * -.iss^; iiod. Assignment of missing values under such circum- 
stances would not guarantee accurate estimates of the degree of associa- 
tion between variables; some assignment techniques described below, 
however, may help avoid the problem of exaggerated correlation. 

Since distortion can occur with or without assignment of miss- 
ing values, no general state: .^nt about the wisdom of assignment can be 
made. The matter must be left to each researcher, who will take 
responsibility for safeguarding his analyses from the particular kinds 
of distortion least acceptable within the framework of his problem. 

For these reasons, we recommend that NCES not attempt to 
provide only tapes with data augmented by assignment of missing values. 
NCES might, however, choose to offer two versions of NLS data tapes, one 
without assignments, one with assignments appropriately coded as such, 
i.e. , the parallel analysis tape discussed previously, A better approach 
to the problem is for NCES to provide data users with a technical manual 
containing detailed discussions of various possible "fixes" for item non- 
response. The manual should include instructions for carrying out those 
procedures deemed most suitable for different uses of the data base and/or 
citations of sources for such detailed instructions. 

In its own analyses, of course, NCES may wish to adopt 
some standard policy on assignment of missing values, so that comparabil- 
ity among its various studies will be maintained (and to forestall erratic 

8 3 - 



77 



treatment of the problem which might result from personnel turnover 
within NCES), NCES staff analysts or their contractors should be free 
to decide whether or not assignments should be made in the context of 
their particular studies , but procedures for making assignments should 
be standardized. It may not always make sense to assign values for miss- 
ing cases but, if it does, the procedure should be uniform from study to 
study. Our second recommendation is that analysts working with the 
NLS-HS data under NCES auspices should be given discretion with regard 
to whether or not data are to be assigned, but little discretion as to how 
assignments are to be made, * 

Both of the foregoing recor nidations assume the existence 
of knowledge about the effects of various methods of assignment upon the 
results of analyses performed with the NLS-HS data. Since it is not 
clear that adequate knowledge presently exists, our third recommenda- 
tion Is that NCES undertake empirical studies aimed at examining the 
consequences of applying any assignment procedure to the NLS-HS data. 
Any decisions about standardization of NCES procedures, or any recom- 
mendations included in a data users* manual, should be based on rather 
extensive experimentation with the data base itself, since either kind of 
judgment is likely to- -and should be intended to— stand for some relatively 
long period. Decisions based on theory alone or on experience with 

* Perhaps a set of limited options could be provided, permitting some 
discretion as to choice of method, 



84 



78 

assignment procedures used on other data bases may prove inadequate 
fnr NCES's in-house standardization or for a users' manual. 
Assignment practices: review a nd critique. 

The following discussion focuses on general caches' to 
the problem of item non-response . 

Procedure , Our discussions of assignment procedures are 
limited to general descriptions, sufficient to provide a basis for consider- 
ing possible consequences of each approach; details are available from 
the sources cited. The discussion focuses on longitudinal, rather than 
cross-sectional, analyses of the data because the primary purpose of the 
surveys of 1972 high school seniors is longitudinal analysis, We are 
more concerned, for example, with the effects of data assignment on 
correlation of individual values over time than on associations among 
sul ,ite >ries of respondents . * 

The discussion is 1 <"* ^n an examination of approaches 
used in several large-scale surveys, all L one sponsored bv the federal 
government and all involving samples intended to represent major seg- 
ments of the U.S. population. Four of the surveys employ panel designs; 
three of these are ambitious efforts to follow panel members for several 
years, The analysts whose approaches are discussed have, in the main, 



* The longitudinal orientation does not, of course, exclude concern for 
the effects fc>f assignment upon distributions. Trend analyses as well 
as panel analyses may be considered 'ongitudinal by some method - 
ologists, and it is not our intention to preclude such a definition, 

ERIC fts • ' 



79 



held concern; about data assignment: like those faced by prospective 
analysts of the NLS-HS surveys. 

Needless to say, the studies discussed employed machine 
processing* of large volumes of data, and most have engaged the efforts 
of many analysts. It may be supposed that the approaches used were 
based on well-informed judgments, made by many qualified profess ionals* 
about the relative merits of available treatments , 

While the number of surveys discussed is small, those ex- 
amined represent major efforts with a degree of complexity comparable 
to the NLS-HS survey(s), and may, therefore, be considered a reasonable 
judgmental sample of such surveys. This M sample" includes Project 
SCOPE, Project TALENT, the Educational Opportunity Survey (Coleman 
Report and subsequent Office of Education analyses), the QEO- University 
of Michigan Panel Study of Income Dynamics, the DO L~ Ohio State Univer- 
sity National Longitudinal Surveys of Labor Market Experience, and the 
1970 U.S, Census of Population, 

Varying approaches. Our examination of these survey studies 
shows that no single approach has been accepted for general application. 
The analysts of Project TALENT, the Coleman Report, and the DOL- 
OSU labor market surveys make no assignments (some give reasons, 

* An important qualification because machine processing is involved 
in subsequent considerations of advantages and disadvantages of the 
various methods* Special coding may be required in some instances 
to forestall certain "errors' 1 which can arise in machine processing, 
On the other hand, some methods are feasible options only with 
access to appropriate computer programs, 

er|c 86 



80 



others appear to ignore the matter), while analysts for Project SCOPE, 
the OEO-UOM income dynamics study, the OE analysts of the EOS survey 
and the Census statisticians each use a different method of assignment. 
Excepting the Census, which uses the "hot deck" (a random match) pro- 
cedure, adjustments are always made by some variation of assigning a 
mean (or median) value. The variations lie in what mean is assigned 
and/or how the subcategory whose mean is to be assigned is chosen. 
Project TALENT . 
Sources- (1) The Project TALENT Data Bank: A Handbook (J. G. 

Claudy, ed, ); Palo Alto, Calif.- American Institutes 
for Research, 1972 
(2) Flanagan, J. C, M. F. Shaycroft, J. M. Richards, 
Jr., and J. G. Claudy Five Years After High School ; 
Palo Alto, Calif, ■ American Institutes for Research 
and the University of Pittsburgh, 1971 
The TALENT staff seems •> have ignored the problem of item 
non-response, though this is something of an exaggeration. The handbook 
(I) makes a brief mention of the problem: 

A potential problem with any Data Bank study is that of miss- 
ing data . A great deal of information was collected cn each 
participant in 1960 and virtually every case is missing a few 
data items . In correlation and related types of analyses these 
missing data can seriously affect the results. There are 
several ways that researchers can handle this problem: (1) 
completely eliminate from the study any case with missing 
data on the variable of interest; (2) base the individual sum- 
mary statistics on all cases for whom the variables of 

87 



81 



interest are available (e^g,, use a missing data correla- 
tion program); (3) substitute the sample or population mean* 
median or sqjne other value for the missing value. The 
researcher also has the option, of course, of specifying 
some other procedure, (1:21) 

For their own work* the TALENT staff appear;- to depend 

chiefly on the deletion of missing-data cases. Discussing the computation 

of the 1 'Socioeconomic Index" (a key control variable drawn from nine 

Student Information Blank items), the handbook says: 

Items to which a student gave a non -applicable response were 
not included in the computation of his * . . socioeconomic 
index , . . Each student's response to each of these /SIB/ 
items (excluding those items which he omitted or to wliicfh 
he gave a M not applicable 1 ' response) were converted , , , 
to standard scores /which were subsequently/ used to com- 
pute his Socioeconomic inde^ score * , , (1:46-49 passim*) 



Clearly, no assignments were made for missing data in the computation 
of this critical control variable. 

In the research reports presented in (2), there appear to fv-* 
two distinctive approaches: 

a* For descriptive studies, tabulations include a residual 
category containing all non-specific cases (eg, , "don't know" plus item 
non-response); 

b, For studies of correlation, the data base is simply the 
cases for which appropriate data are available. 

Tabulations (including percentages) are generally based on data weighted 
to represent the 1960 high school population, Correlational analyses 
tend to use unweighted data, unadjusted for item non-response. These 



88 



82 



practices follow the dictum given by Shaycroft and Richards: 

, , , assignment of appropriate weights is a crucial step 
in data analyses the results of which are supposed to be 
accurate estimates of numbers of cases or percentages of 
cases in specified categories in the corresponding segment 
of the national population . . . For many other kinds of 
analyses, where what is sought is relational data g and the 
answers to questions about relationships between various 
variables, weighting cases differentially is of far less 
importance and in some cases probably quite undesirable. 
Correlation matrices are an example of kinds of data 
analysis in which the use of unweighted data is generally 
quite satisfactory, (2:1-15)* 

Most of the correlational studies reported in (2) drop missing 
cases from the data base, in accord with the advice given by Shaycroft 
and Richards and the second option noted in the handbook (1), 

We question both the advice and the practice. First, we know 
of no statistical reasons why weighted data should not be employed in 
correlational analyses, ** Second, it should be recognized that dropping 
missing data cases m effect weights the data, because some members of 
the sample are "assigned" a weigp* , ero, Thus, TALENT analysts 



* The page numbering system used in (2) opens the possibility of con- 
fusion in citations. Pages are numbered with a digit for the chapter 
followed by a dash and one or more digits for the page within that 
chapter. This quotation is ? then, to be found on the fifteenth page of 
chapter one, 

** That it can be Is indicated by Nie et ai . , who have included a weighted 
data correlation subroutine in the^tatlsticai Package for the Social 
Sciences, and by use of weighted data in the regressions run for the 
Coleman Report (Coleman et al . , 1966:571), They may object on 
grounds like ours (lack of independence) or because they fear errone- 
ous attribution of statistical significance to correlation coefficients, 
which might result from inflated sample size. 



89 



83 



have used weighted data even while objecting to the practice* 

Where it mayjpa possible to do so, it would seem preferable 
to weight the available cases to compensate for both general and item non- 
response. To do so would permit the analyst, rather than the non-respon- 
dents , to control the representativeness of the sample from which correlation 
estimates are derived. 

Whether or not it is feasible to use weighting to compensate 
for item non-response w a separate matter. In the most common instance, 
a substantial amount of missing data comes from people who omit answers 
for only a few of several variables to be related. This alone would impiv 
a complex effort to assign weights. In addition, the common case implies 
that any one individual will have answered at least one of the items, This 
makes weighting nearly impossible, because variables for each individual 
are not independent* Weights assigned to compensate for mir oam 
on one variable would result in mis -weighting other variables ir.^re a 
response is present* When every missing-data case lacks responses on 
each variable under study, weighting might be acceptable, but we think 
this is a rare instance* Even in such instances, the amount of effort re- 
quired to determine an appropriate weight would probably prove prohibitive, 

The whole point of adjusting data for any kind of non-response 
is to improve the representativeness of a sample, hence the accuracy of 
population estimates* Faced with varying rates of item non-re&ponse, an 
analyst must decide whether some adjustment he can make will eliminate 

90, 



84 



systematic biases that result from rod -response, if weighting introduces 
systematic biases, it may be as grm? a r-ouror of distortion in population 
estimates, for correlation coefficiei td or other summary measures, as 
is the non-response for which it is intended 10 compensate. 

The TALENT analysts appear to overlook the self-weighting 
which results from item non-response. Their only suggested method for 
adjusting data (assigning the population or sample mean) would result in 
very crude estimates indeed, and would tend to reduce most correlations. 
Hence, they prefer to calculate correlations with "unweighted" and im- 
as signed data. For the reasons we have outlined, and because there are 
now ways of assigning much more refined values, we find the TALENT 
approach to item non-response not a suitable model for the NLS-HS sur- 
ve5 r s , * 

Educational Opportunity Survey - Coleman Repo rt, 
Source: James S. Coleman, etal,, Equality of Educ ational Qpporj 
tunity. Washington, D. C . : (DHEW-OE) National Center 
for Educational Statistics, 1966 
The Coleman Report approach to item non-response is given by a single 
statement "buried" in the technical appendix: 



Their policy of showing item non- response as a separate category in 
tables is a point in their favor. Unfortunately, they have mingled 
"don't know" and other residual categories with actual non-response' 
thereby losing the analytical value of "don't know. " Summary meas- 
ures for such a category will be largely meaningless. 



O 01 

ERIC J 1 



85 



The estimated tomls, averages, and proportions reported 
in section 2, of the report have been developed by the use 
of a ratio estimation procedure. This procedure was car- 
ried out for each of the five racial composition groups in 
each of the primary sampling units . These weighted area 
statistics were then combined so as to produce the desired 
regional and national estimates . . . No allocations or im- 
putations were majy- for item non- response ~ Averages 
were calculated only on the schools who responded on the 
item, Proportions were calculated on all schools , with 
the proportion not responding calculated as a separate 
category, (Coleman, et aL ? 1966:558; emphasis added,) 

So far as we are able to determine, the underscored statement stands 
without accompanying justification as the sole comment on treatment of 
item non-response. Although some attention is given to the problem of 
respondent reporting error (pp. 568-70), there is no comparable state- 
ment on item non-response, hence we must presume no special effort 
was made to assess potential biases from that quarter. 

The technical discussion of the methods for the regression 
analyses states only that a pairwise -deletion procedure was. used in cal- 
culation of correlation matrices: 

Missing data was /sic/ treated as follows: correlations 
were calculated by" use of each case for which both variables 
in the correlation were present, Tuus, a case >vith a miss- 
ing observation was deleted only for those correlations in 
which this variable was involved, (Coleman, et al . , 1966:571-72) 

For constructed variables (indexes) used in the regressions, 

each item employed in the inde>: was standardized with mean equal to 

zero. Within this schema, item non-responses were assorted zero, 

which, as the authors note (p. 572), is equivalent to assigning the 



02 



86 



population mean before standardization , 

It is worthy of note that the Coleman Report obviously has 
no fixed policy for treatment of missing data. For descriptive work, no 
assignments are made and the non-response is treated (or so we are 
told) as a separate residual category s which is included in the bases fur 
percentage calculation. For correlational analyses, however, two a:h*:z 
approaches are taken: pairwise deletion in the case of simple variables 
included in zero -order correlation matrit > -n: assignment of the popu- 
lation mean to variables upon which const. •; 1 indexes are based. 

There is no way of assessing, short of extensive reanalysis 
of the raw data, how this rather casual treatment of item non-response 

r 

may have affected the interpretation of numerical results, but one may 
assume that there was some effect, The total number of assignments Is 
likely to rise when an index constructed from many variables is employed 
because it is likely that different people will omit given items, causing 
the number of cases with at least one assignment to rise as the number 
of variables in the index increases. Consequently, when an indexed vari- 
able is correlated with a single variable, assignment of values for item 
non -respondents is likely to influence the relationship more than it would 
when two simple variables are correlated. If there were any substantial 
number of missing-data cases included in the indexed variables, correla- 
tions between those variables and simple variables having complete data 

93 



87 



would probably be reduced, * 

Given these considerations, and the heavy dependence of the 
Coleman Report on correlational analyses, it would seem that some more 
cor^istent policy for treatment of missing data should heve been followed. 
It should be clear that a policy for treatment of item non-response must 
; decided before analysis begins and must take account of v}w plan for 
analysis . The Coleman Report fails on the last criterion, *"* 

National Longitudinal Survey of Labor Market Experience , 
Sources: (i) Fames * H. S., R, GL Miljus, R* 5, Spitz and Associ- 
ates ; Career Thresholds; A Longitudinal Study of the 
E ducational ana juabor Market Experience of Male 
Youth, Manpower Research Monograph No, 16 (three 
volumes), Washington: U.S. Department of Labor, 
1970, 1971 

(2) Shea, J. R. f R. D. Roderick, F. A, Zeller, A. L 
Kohen and Associates; Years for Decision: A Longi- 
tudinal Study of the Educational and Labor Market 
Experience of Yo ung Wome n (Vol. I) Manpower Re- 
search Monograph No, 24 . Washington: U.S. 



* Unless, of course, variables with assigned values exert no influence 
on ths I;adex value- -hardly a likely event. 

** In fairness, it must be said that the Coleman Report was a very ambi- 
tious project- carried out under severe time constraints imposed by the 
U.S. Congress, Only the most pressing analytical problems could be 
given close attention, and item n on -response may well have been the 
least of the problems faced by the analysts, 

o 94 
ERLC 



88 



Department of Labor, 1971 
(3) The National Longitudinal Surveys Handbook, Columbus, 

Ohio: Center for Human Resource Research, The Ohio 

State University, October 1975. 

The Rame general statement regarding assignment of missing data appears 

in each volume of source (1) and, in modified form, in sources (2) and 

(3). It may be presumed that the same approach has been taken consistently. 

The Ohio State group follows a more consistent policy than either the 

TALENT group or the Coleman Report, in that Item non- r esponse is 

apparently deleted throughout: 

In calculating percentage distributions, cases for which no 
information was obtained are excluded from the wtnL This 
amounts to assuming that those who did not resfx d to a par- 
ticular question do not differ in any relevant respect from 
those who did—a reasonably safe assumption for most vari- 
ables especially when the number of no responses is small, 
(!,vo\l;3-4) 

Horn uf the (ques onnaire) edits included an allocation rou- 
tine v/ it r : h was dependent on averages or random information 
from ouu^iy screes, since such allocated data could not 
be expected to be consistent with data from subsequent sur- 
veys* However, where the answer to a question was obvious 
from others in the questionnaire, the missing answer was 
entered on the tape, 
• * t 

Further, vjme of the status codes which depend on the an- 
swers to a number of different items, were completed using 
only partial information. The most obvious example is the 
current employment status of the respondent . . , This is 
determined by the answers to a number of related questions, 
However, if one or more of these questions is not completed 
but the majority are filled and consistent, the status Is de- 
termined on the basis of the available responses. This gives 
rise to an artificially low count of "NAV for certain items* 
(1, vol- 1:211-12) ■ 95 



89 



The justification for non -allocation is a reasonable but weak 
one, Its weakness lies in tfv implicit assumption that "no data" will be 
more consistent with data from s jbsequent surveys than would an assigned 
value, an assumption which we think untenable except for those cases in 
which the respondent consistently fails to respond to an item, (Indeed, if 
there is any substantial number of chronic item non -respondents, an 
analyst should become suspicious about systematic bias in the data and 
examine their records quite closely, rather tha r i $ ^nply deleting them 
from the data base, ) Where non -respond 'o a r^- i a item is not chronic, 
it seems likely that a properly assigned value will have more research 
utility, because it preserves the base for sequential data, than a retained 
non-response. Some methods of determining what values are to be 
assigned provide rather refined estimates, so that the degree of error 
in repeated -measures correlations can be rather small, 

That portion of the justification given for deleting "small num- 
bers" of item non-response from percentage bases is acceptable, If the 
proportion of item non- respondents is quite .small, it will matter little 
how far their true state of affairs departs from that of the respondents, 
since the summary statistics (with some exceptions) can be little affected 
by small proportions , 

But Mayeske (1972) has demonstrated that it is i ... ?ume 

the equivalence of respondents and non -respondents for some kinds of data. 
Consider variables for which the possible empirical range is very great 

* 96 



90 



and the item non-respondents are drawn systematically from the tails of 
the distribution, In such a case, means and variances might be markedly 
altered by omission of the Item non -respondents. 

Such effects would be trivial in most cases, although one 
might be concerned If, say, all item nan -respondents* constituting (e.g. ) 
4 per cent of the cases, happened to make up an extreme class, such as 
the class of all families with annual incomes above $50, 000 or of persons 
with postgraduate degrees, Deleting item non- respondents in such a case 
would yield a percentage distribution in which those classes would be 
empty, It may be that such an unusual form of systematic bias never 
occurs, but the fact that it could occur should make a researcher wary 
of adopting a policy on grounds like those put forward by the Ohio State 
group, 

The TALENT group's approach, even though Imperfect be- 
cause item non -respondents are mingled with other unspeclflc responses, 
is preferable: include non -response in the percentage base, and report 
it as a separate category, This at least giv^es the research consumer 
some idea of the possible kinds or degree of error which may exist in the 
reported distributions, and gives him the opportunity to recalculate a 
distribution using only completed cases, 

Non- assignment: a summation. The three examples just 
critiqued suggest that justifications for abstaining from assigning values 
to missing cases tend to be absent or weak, When a researcher has only 

97 



91 



the crudest of methods for determining values , or when Item non-response 
is quite lo w * he might be better off not to assign. But even in the case of 
sfialy se g ^ased on comparison of individual scores over time, the fact 
that correlations may be affected by assigned values seems less a prob- 
lem than the alternative fact , that without assignments there can be no 
use &f cases where either of the correlated values are missing. With or 
^ithaut assignment, estimates of population relationships may be biased, 
jut the researcher who assigns values according to some well-grounded 
scheme stands a better chance of avoiding serious bias than his colleague 
Who Qlio^e the non -respondents to determine what portion of the population 
goes unrepresented. It would seem to be a rather exceptional case, then, 
in which n° assignment of missing values should be made. 

The strongest justification for nonasslgnment would seem to 
exist when comparisons are to be made among irreduciblv small subsets 
of a large data base, i.e., those subsets for which no internal differentia- 
tion can b« made and whose means or medians would, therefore, serve 
as the assigned values for all miss ing cases within the subset. En such 
an instance, little would be gained by adding to the number of cases and 
one would lose variability which could be essential to the analysis. Even 
in such c^ses there might be practical reasons, such as maintaining equal 
cell frequencies for ANQVA , why the researcher would prefer to lose 
gome variability rather than revise his analysis plan or face major diffi- 
culties in the processing or Interpretation of data. 

9.8 



42 



Perhaps the bast reasons for preferring assignment lie in 
the reairn of the researcher's control over the representativeness of his 
data. Abdication of control over sample representativeness , which is 
implicit when no assignments are made, seems hardly desirable as a 
research stratagem when there are workable alternatives. In the follow^ 
ing section we consider four cases w^here some form of assignment was 
used. 

1970 Census, 

Source; D, 8, Bureau of the Census; Census of Popula tion ; 
1970 ; VoL I, Characteristics of the Population, 
Part 1, United States Summary - Section 2, Washing- 
ton : Government Printing Office* 1973 (Appendix C, 
pp. App. 67-69) 

The decennial census is intended to provide cross-sectional, descriptive 
data, In this respect! it differs from the others under consideration and 
from the proposed uses of the NLS-HS survey, For present purposes, 
however* the Census must be considered since its approach to dealing 
with item non - response is a standard that may be presumed grounded on 
well ^conceived and well -executed policy. This presumption, of course* 
does not imply that the Census technique Is necessarily applicable to other 
data bases. As it happens, however, the Census approach corresponds 
in important respects with the approach taken by the University of Michi- 
gan Institute for Social Research, which we consider next, 

99 



Census allocates missing data in two ways, only the second 
of which concerns us here: * (1) by substitution of a complete record of 
one household for that of another dwelling unit which is determined to be 
occupied but from which no response was forthcoming, and (2) by using 
a "hot deck" (random match) as the source of a value for missing: data in 
item non- response, 

The Census version of this random matching procedure is 
possible only with special computer programs that cause selected com- 
plete records to be temporarily stored in memory. ** Each such stored 
record Is replaced by the next-appearing record that matches it on selected 
characteristics, so that the particular record In storage at any point in 
time is a function of the order in which data are processed. If the order 
is random, the record In storage at any moment is to all intents randomly 
chosen from the population of all records having a prescribed combination 

of characteristics , 

The combination of matching characteristics depends upon 
what item of Information is involved as the dependent variable, In the 
case of income (for example) characteristics such as sex, race, age, 
geographic location, education, occupation, and the like might be considered 
relevant variables , and records representing all possible combinations 

* The first technique applies to general , rather than item, non- response 
and: is therefore outside the scope of this commentary, 

However, the basic "hot deck" approach, involving random selection 
of a value, might conceivably be used by .almost any researcher, 

100 



94 

of coded values would be held temporarily in memory. These "reference 
records," as we shall call thern, are records containing values for the 
variable of interest (e.g., income). At any given moment, Chen, the 
computer will have on file a record containing all information needed (i.e. , 
in use) to assign a value for income. * 

When a record missing a dependent -variable value appears, 
it is matched (as closely as possible) with one of the "reference records," 
and the value for the "reference record" is assigned to the missing data 
case. In the ongoing example , a record lacking income data would be 
matched with a reference record on the basis of the predictive character- 
istics, and the income contained in the reference record would be assigned 
as the income of the item non -respondent , The item non- respondent's 
revised record then replaces the initial "reference record," and itself 
is stored in memory until the appearance of another complete record with 
the appropriate combination of "match" variable values. 

The end result of the "hot deck" procedure, applied to a very 
large data pool, is that the mean and distributiori for the assigned cases 
within any characteristics -defined subset will approximate that for the 
known-data subset to which it has been matched. This follows from the 
fact that the assigned values are in effect randomly selected from within 



* Needless to say, for dependent variables other than income, there 
would likewise be sets of "reference records" with complete data on 
the dependent and the "match" variables. 




101 ! 



05 



the range of the known coses for a. subset of the population. * 

This approach clearly differs from the more common method 
of assigning a known -subset mean to all item non- respondents, since the 
latter eliminates ail variability among the assigned cases although, of 
course, it too fields the known-subset mean as the mean of ail assigned 
cases , 

It is to be emphasized that the Census approach Is designed 
for purposes of cross- sectional description and to offset consistent biases 
in Item non-response, Thus, if the poorly educated portion of the popu- 
lation tends to omit (In our example) income Information, the "hot deck" 
procedure compensates for this tendency. It yields Improved estimates 
of the proportions of the population with lower incomes, because It takes 
advantage of known correlations between Income and "match" variables 
such as sex, race, education and the like. 

It should be clear that the Census approach would not be suit- 
able for panel studies, because of the random nature of values assigned 
to individual missing-data cases. Were it used with panels, repeated- 
measures correlations could be affected quite markedly, for the degree 
of error in the assignment of any one case is likely to be large, (Even 
though the aggregate error involved is no more likely to be large than is 



* The "reference records" may be regarded as having been weighted 
by a factor of 2 , since their criterion values are entered twice into 
the data pool. 



o 102 ■ 

ERIC 



96 

the case whan subset means are assigned. ) 

It is entirely possible that, in our example, an item non- 
respondent whose actual income is in the top five per cent of the subset 
distribution will be assigned a value from the bottom five per cent, intro- 
ducing a large error for that record, Though the incidence of such extreme 
errors would not be large, it may be supposed that 68 per cent of the 
assigned cases could have values departing from their true value by one 
to two, and the other 32 per cent could have values erring by three to four, 
standard deviations, The consequences of such potentially great errors 
for dynamic analyses are apparent, As stated in Parnes, et ah (1971), 
tf . , - such allocated data could not be expected to le consistent with data 
from subsequent surveys, ,f It is this consideration which rules out direct 
application of the "hot deck" method for panel studies, 
Panel Study of Income Dynamics * 
Sources: (1) Finlayson , S. (ed, ) A Panel Study of Income Dynamics: 
Study Design, Procedures, Available Data -- 1968-1972 
Interviewing Years (Waves I ■ IV). V^ol L Ann Arbor: 
University of Michigan, Institute for Social Research, 
1972, (pp, 273-321) 
(2) Morgan, j, N K. Dickinson, J. Dickinson, J. Benus, 
and G, Duncan; Five Thousand American Families-- 
Patterns of Economic Progress . Vol, 1; An Analysis 
of the First Fiv^e Yjears of the Panel Study of Income 

103 

ERIC 



07 



Dynamics Ann A a-bor: University of Michigan, Institute 

for Social Research, 1974. 
Of all the studies examined, the most extensive discussion of Item non- 
response, and we think the most sophisticated approach, is given in the 
above study. The ISR employed a staged approach to data assignment, 
running from judgments based an other information supplied: by the Item 
non- respondent* to assignments from tables calculated by a statistical 
algorithm called AID (automatic Interaction detector). Assignments were 
assessed and coded in "minor" and "major" categories, according to the 
degree of probable error: minor assignments were those for which prob- 
able error was under $300 or less than 10 per cent of the value of the 
variable, and major assignments were those for which error was at least 
$300 or 10 per cent of the value of the variable. 

The use of the AID procedure is the most interesting aspect 
of the assignment procedure. As described in Morgan et al. (1974:359- 
62), ** the AID procedure resembles the Census approach in that it uses 
a set of predictor variables which describe some subset of the sample of 
known responses, whose characteristics can be employed to determine 
a criterion value for assignment to missing cases. 

* Editing assignments were, like those described by Fames et al. (1971), 
made on the basis of examining other responses in the interview proto- 
col or by reference to in for mat ion supplied in earlier Interviews . 

** A more complete description is provided in Sonquist et al . (1973) and 
a description of the related THAI!) appears in Morgan aria Messenger 
(1973). 104 



The AID procedure differs importantly from the Census "hot 
deck" procedure in three respects. First, whereas the Census employs 
a fixed set of predictor variables for any one criterion* the AID program 
may yield differing sets for every value of the first (most important) 
predictors and likewise for each subsequent predictor, Thus* the AID, 
by taking advantage of interaction among the predictors! is capable of 
producing a very refined subset structure for use in " matching' * item non- 
respondents. Second, whereas the selection of predictors employed by 
the Census is based upon externally known correlations, the AID proce- 
dure searches and substructs the given data set to locate thosa subsets 
of characteristics which predict best for the body of data under immediate 
consideration. Third, whereas the Census procedure assigns randomly 
matched individual values of the criterion variable, the AID procedure 
yields subset means which may then be assigned to missing data cases - 
An example of the output of the AID procedure is shown in Figure 3, as 
it appears in Morgan et ah (1974:48), 

The meaning of the variables in the example need not concern 
us here* The focus of interest is the variation in predictors at the third 
and fourth levels of the chart (race vs. ability test scores at level 3, four 
different variables at level 4) and the marked variation in criterion means 
for every contrasted value* of a given variable, 

* Continuous values are bracketed into a small number (5-10) of cate- 
gories. The AID program examines all possible dichotornous splits 
for each independent variable , 

105 




All Families with a Different Head 
Since I960 
(24% of 111 1972 Families) 

3,65 



Not Married Now 



■0.30 



Married Now 



1.70 



V 



White 



•2,38 



Nonwhite 



Test Scores 
(1M3) 



Low Test Scores 
(040) 

11,09 




18-24 Years 
Old 



1216 



Low Index 
of 

Achieve- 
ment 
Motivation 

1,9 



High Index 

"of 

Achieve- 
ment 
Motivation 

8.1 



In I960 
Family* 
Youngest 
Child Aged 
5 or Older 

8,0 



In I960 
Family, 
No Children 
or Youngest 
Under 
5 

14,3 



"^1 -Example olAluowpiif 
ERIC;: Morgan et aL (1974:48) 



100 



It is evident that assignment of mean scores for missing data 
would be much more precise with breakouts produced by the AID program 
than with more ordinary approaches which use a fixed set of descriptors* 
In Figure 3, the breakouts shown are those which best differentiate the 
sample, to account for the greatest portion of variability in the criterion 
score, Th procedure informs the analyst what variables (from a set of 
selected candidates) to "match" on and what criterion mean to assign for 
item non -respondents with given combinations of characteristics. 

The ISR assignment method is a variant on the standard pro- 
cedure of assigning subset means, Its particular suitability, as a possible 
treatment of the NLS-HS data base lies in the use of the AID program to 
help specify what means to assign, 

Like most other routlnized (especially, machine-performed) 
procedures* the ISR approach has certain intrinsic problems. Reliance 
on the AID to produce subset means would require great care to assure 
that assignments are not unintentionally compounded, (An analyst might 
choose to allow compounding, however,) The program does not auto- 
matically discriminate between assigned and actual item values, It is 
conceivable that routine use of all cases for which a value is available 
might result in AID analyses based on a large proportion of assigned 
values. In that case, its power would be vitiated at best or, at worst, 
its outputs might be unreliable, Some protection against compounding 
assignments Is needed, which (we presume) is one reason for special 

108 



101 



"flagged" coding of assigned values in the ISR approach. Clearly, the 
problem of compounding by routine machine processing can be forestalled 
by appropriate coding and programming, But NCES would have to warn 
prospective users of the approach that some such protection is needed, 
and might well suggest appropriate safeguards. 

Educational Opportunity Survey - Office of Education , 
Source: Mayeske, G. W.» C, E. Wisler, A. E. Beaton, Jr., F , D, 
Weinfeld, W, M. Cohen, T. Okada, J, M. Proshek, and 
K, A. Tabler; A Study of Our Nation's Schools Washington: 
DI1EW - Office of Education, 1972, 
The Mayeske group's analysis of data from the survey which generated 
the Coleman Report (see above) employed an unusual method of assigning 
values to item non- respondents . For reasons other than data assignment, 
Mayeske wished to create variables scaled in a common metric from a 
diverse set of available variables. To do so, he selected an intrinsically 
interesting "outcome" or criterion variable, and performed criterion 
scaling. 

In brief, Mayeske computed the criterion mean for each cate- 
gory of nominal variables, and/or for each value (or, bracketed Interval) 
of continuous variables, within a set of variables chosen for use in factor 
analyses and regressions. These variables were then scaled In terms of 
the associated criterion value. The procedure not only put all the "inde= 
pendent" variables into a common metric, but allowed nominal variables 



10 



102 

co be represented in interval form. For example, the distinction between 
males and females might be represented as means of 4.5 and 3,0 in terms 
of a criterion score . 

As a byproduct of the scaling procedure, Mayeske was able 
to determine the criterion mean for item non- respondents on the indepen- 
dent variables. Like any category of item respondents, item non -response 
could be represented by some mean criterion score, Mayeske found 
that, for many independent variables, the item non-respondents scaled 
quite differently from any of the several categories of respondents , (See 
Mayeske etaL, pp* 10-11*) 

Like the Michigan ISE approach, Mayeske 1 s group Is, of course, 
using a variant of the traditional method of assigning a category mean to 
item non - respondents . The unusual aspect of the approach is that it does 
not assign the mean for some "matched" group of respondents, but pro- 
vides a value unique to item non-respondents . 

While it has much appeal, this approach must be criticized 
on grounds of theoretical value, Treating item non -respondents as a 
separate category masks as much as it reveals, and information about 
differences in the criterion scores of item non -respondents and respon- 
dents has virtually no theoretical value. If the purpose of education re- 
search is to develop predictive models about relationships between certain 
individual or group characteristics and (say) academic success, there is 
noway to incorporate ,T non -response to an extra curricular activities 



ERLC 



no 



103 



item" as a variable In the model. 

For correlational analyses (in which, factor analysis may be 
included), the Mayeske approach is valuable insofar as it maintains the 
size and representativeness of a sample, thus permitting greater confi- 
dence in estimates of associations between dependent and Independent 
variables within the population. But when specification of complex (multi- 
variate) relationships is the objective of the analysis, criterion- scaled 
scores for item non- respondents would seem merely to muddy the analyti- 
cal waters . 

The Mayeske group suggest (p. 11 -12) that knowledge of a 
criterion score for an individual may permit inferences about other charac 
teristics which have been criterion scaled. We agree, but would raise a 
question about the accuracy of such Inferences If the variability within the 
item non-response category is large (i.e. , if the fact of non- response is 
poorly correlated with other personal characteristics) . Mayeske gives 
the example of estimating father's occupation level from a criterion 
score by comparing the "don't know" mean with that of means for known 
categories of father' s occupation, Examination of his table 3.3.1.1 
(p. 10) shows that for twelfth graders the "don't know" criterion mean 
most closely approximates that for students whose fathers are farm 
workers (as does the mean for Item non-response) . In this instance, 
because there Is a roughly linear relationship between father's occupa- 
tional level and the criterion, Mayeske et al. suggest that a relatively 



111 



104 



low criterion score might be used to assign "one of the lower ranks ft to 
(nussitig) father's occupation, 

Mayes ke's data show that no occupational category but "farm 
worker* av^ a roughly approximates the "no response" criterion score. 
Given this , one would be forced tc assign the rank appropriate to "farm 
w ot ket\ ?r gut, because the occupational status of ,T farni worker" is vague 
and be^aus^ the link between May^eske' s criterion (achievement composite 
sC ore) and 1 'father's occupation" is imperfect, following Mayeske's sug- 
gestion wouJd produce doubtful assignments. 

For Majr eske's data, occupations are ranked by criterion 
score (at gr a de 12 level) as shown in Figure 4, The location of "farm 
worker*" this ranking does not correspond well with its place In the 
NO*C Scale °f Occupational Prestige, In that scale, "farmhand" ranks 
sojnewhat aP°^ e several semi-skilled jobs (coal miner , taxi driver, 
restaurant v^iter, bartender), rather than far below the semi-skilled 
level a§ ^ ylayeske' s ranking. Since there is much evidence for the 
validity of tt ie NORC prestige score as a correlate of Income and educa^ 
tipflf We tend to trust the rankings It produces more than those given by 
Majresk:e f s criterion scaling procedure , 

It seems very likely that an ad hoc criterion such as that 
us ad by j^ a yeske would produce many niisclassifications . To avoid them, 
the snai^gt should have to use only a few very broad categories and/or 
would be for ce d to make extensive checks on the reasonableness of 

112 



IDS 



Fatiier'6 Occupation 


Grade 12 
Criterion Average 


Rank 


Profess icmal 


56.0 


1 


Sales man 


53.6 


2 


Manager 


52,8 


3 


Official 


52.7 


4 


Techaical 


52.4 


5 


Farm or Ranch Owner or Manager 


50.7 


6 


OKj.1JLS*J VV Ul WX IPKJj. CitictU 


50.6 


7 


Semi- Skilled 


49.5 


8 


Workman or Laborer 


47.2 


9 


Farm Worker 


42.5 


10 


Non -response 


42.3 


1L 


DoTi f t Knew 


41.8 


12 


Average 


• 50,0 





Fig, 4. --Occupational ranking iased on child's scholastic achievement 
scores. Source: Mayeske, et al. (1972:10; Table 3.3 .1.1). 



113 



106 

assigned values. The need to supplement criterion scores with, evidence 
external to the study suggests that the method cannot suffice as the prin- 
cipal basis for assigning values, 

Differences among criterion means for nominal categories* 
as In Figure 4, may be just too small or too ill -defined to permit reliable 
assignments from the criterion to other variables. Mayes km et aL them- 
selves seem ambivalent on the matter, since they point out Cp . LO) that 
curvilinear relationships or other departures from well -defined linear 
relationships may make it impossible to use criterion scores for assign- 
ing values on other variables , Nonetheless* they subsequently (p . 11) 
summarize the advantages of the criterion scaling approach with initial 
emphasis on its potential utility for such assignments. 

The basic problem with employing criterion means for pur- 
poses of assigning missing-data values* we think, lies In the improbability 
of a high correlation between the criterion and. any one other character- 
istic variable* Because of this fundamental weakness in social science 
data, some form of multivariate approach (like the Census or ISR methods) 
probably will be more adequate as an approach to the problem of item 
non-response. In enumerating the advantages of criterion scaling, 
Mayes ke et al . should have emphasized the fact that criterion scaling 
maximizes the linear relationship of the variable with the criterion, We 
would, then, tend to discount their implied claims for its usefulness in assign 
ing missing data. 



11:4 r 



107 



Project: SCOPE . 

Source: TlUery, D, and T. Kildegard Educational G oals , Attitudes, 
and Behaviors' A Comparative Study of High School Seniors . 
Cambridge, Mass.: Balliager Publishing, 1973 (Note: 
Tlllery is the originator of and senior analyst for Project 
SCOPE, ) 

These analysts employed the most simple and traditional method of assign- 
t 

ing missing values- 

No variable was used for which the nonrespcnse rate -was 
more than about 14 percent, and In most cases thenonre- 
sponse was less than 10 percent, Mean values for each 
variable were used for subjects who did not respond, 
(Tillety and Kildegard, 1973-20) 

According to this brief statement, the analysts did not even assign sub- 
category means. Although they seem to say they rejected items with 
more than 14 per cent non-response, it is more likely they mean "The 
maximum item nonresponse rate was about 14 per cent, ..." Since 
their approach is poorly explained, but appears quite simplistic, we think 
further discussion is unnecessary. 

Assi gnments : s ummatlon . The chief- -here, only- -justifi- 
cation for not assigning values to item non -respondents is stated In Parnes 
et al. (1970): " . . . allocated data could not be expected to be consistent 
with data from subsequent surveys. " The procedure used by the Michigan 
ISR group takes account of the need for caution on this ground by applying 
special codes indicating assignment and the degree of uncertainty attached 



er|c ■ us 



108 



to each assigned ralue. % 

We have argued that a carefully assigned value is probably 
more useful than no value at all, even in longitudinal analyses . Of those 
considered, the procedure employed by the Michigan ISR appears to yield 
the most refined estimates of values, and probably the most reliable, 
Its acceptability m a variant of the standard approach --assigning sub- 
category means- -should be enhanced by the fact that it requires few a 
priori judgments about what variables, or what values of these variables, 
should be employed* as the basis for data assignment* 

In theory, all variables other than the one to be assigned could 

be considered for Michigan's AID procedure if the data base were large 

I 

enough to support the statistical manipulations required, The analyst, of 
course, will probably exercise judgment in the deletion of variables from 
the list of candidates on the basis of expert knowledge or theoretical 
grounds. For technical reasons as well, some selectivity would be re- 
quired. Few data bases are likely to be large enough to make possible the 
simultaneous consideration of all of a large number of available variables, 
and the AID computer program limits the number of variables allowable 
for any one run. 

In sum, although it has certain limitations which it shares 



* These codes, of course, do not enter into computations, but maybe 
used to warn data tape users that the value is in doubt, This is prob- 
ably the best that a data archive can do for its users - 

!. 



109 



with any assignment method, and some which are peculiar, the ISIVs 
use of the AID procedure to determine what means should be assigned 
seems an approach which v^e can recommend for consideration in the 
adjustment of the NLS-HS data base , 
Need for empirical stud ies of Item non-response bias , 

Is high non- response a real problem? We stated at the out- 
set that each researcher must decide for himself whether or not to assign 
data. We favor doing so, especially when non -response is fairly high 
for an item and when there is reason to believe that some systematic bias 
is involved. From a social psychological standpoint, we can probably 
assume that non-response Implies bias . Thm fact of non-response dis- 
tinguishes all non -respondents from all respondents, Whether or not 
Mas is systematic, i,e. , whether the motive for non -response is similar 
across individuals or whether they share relevant characteristics , is a 
matter which must be determined empirically in each case, as was done 
by Mayeske andbyBlau and Duncan (1967:471-76)* 

The data provided by Mayesk-e provide strong evidence that 
item non-response can introduce very serious biases . But comparisons 
of certain Census distributions, with and without adjustments for item 
non-response, indicate that assignment sometimes yields rather minor 
changes even when item non- response is as high as 21 per cent, For 
non-response in the range 4,5 to 11 ,7 per cent, Blau and Duncan (1967:474) 
likewise found negligible effects . 



=L iU 

This contradictory evidence raises doubt that high rates of 
item non-response are necessarily a problem, A cursory examination 
of the adjusted and unadjusted Census data for four variables, as indicated 
in Tables 4 to 7 (appended) shows that the average change In cell propor- 
tion resulting from missing-data assignments was only about 0,17 per / 
cent, regardless of the proportion of missing cases (from about 4 per cent 
to about 21 per cent in the four tables)* The biases in the unadjusted dis- 
tributions were systematic, but rather minor. When the change for each 
cell is if elated to the original cell proportion, the modification resulting 
from adjustment is about 2 per cent * of the unadjusted cell proportion. 

As might be expected, the size of the effects of assignment 
is proportional to the number of categories in a distribution and greater 
for less populous cells. The effects do not seem to depend^ however, on 
the proportion of missing-data cases, as comparisons of Tables 4 and 5, 
and of 6 and 7, show, It might be supposed that systematic biases would 
be more severe as the proportioti of item non-respondents increases, but 
this is not the case for the Census sample data, % * Whether or not this 
is owing to the huge size of the data base, to the allocation procedure 
employed, or to some combination of these is uncertain, 



* Simple mean, over ail cells in column 6 of Tables 4 to 7 collectively, 

** The data of Tables 4-7 are based on approximately 20 per cent of 
1970 households, The figures given there are weighted estimates of 
population distributions. 



1 18 



Ill 

Blau and Duncan (1967) were concerned shout the Impact of 
item non -response upon correlations underlying their investigation of 
occupations . They report an effort to assess the effects of systematic 
bias in the characteristics of those not reporting "father's occupation" 
and "respondent's first occupation, " For the age group they examined, 
non-response rates for these variables were 11.7 per cent and 4,4 per 
cent respectively. Their results show both means and variability were 
reduced for a mixture of persona who failed to report at least one variable . H 
For "father's occupation," the "non respondent" mean was 90 per cent of 
the "complete data" mean. For "respondent's first occupation,," the 
corresponding figure was 84 per cent. Curiously, the Influence of non- 
response was less for the variable with greater non-response . 

By an elaborate procedure, Blau and Duncan estimated a value 
for the likely "true" population correlation, under varying assumptions 
about the unknown correlation between "father's occupation" and "respon- 
dent's first occupation. " By their estimate t the Influence of item non- 
response was minor, probably about the same size as the sampling error 
of "r" for their large (N = 33,000) sample. Thus, in this instance, they 
concluded that bias attributable to Item non-response was not a matter of 
concern . 

* Computed "non respondent." means and deviations were based on im- 
puted values for those not reporting the variable in question, and obtained 
values for those not reporting the other variable (of the pair considered). 



119 



112 



They express concern, however, for other portions of the 
data with item non-response of about 20 per cent, remarking that cor- 
relation coefficients are "especially vulnerable** to misestimation for 
these. 

Given contradictory evidence, wa recommend that NCES 
undertake some special empirical studies of the degree of bias introduced 
by item non-response In the NLS-HS data. 

Since it is too late to conduct personal interviews with samples 
of first follow-up item non -respondents (and doubtless too costly to select 
samples on an item -by -item basis in any event)* a reasonable approach 
to such empirical work might be to construct a subsample of known data 
cases, representing the survey respondents, then to delete cases so as 
to re-create the experienced item non-response, Comparison of the data 
from the whole subsample with the data after deletion of cases would pro- 
vide some estimate of the effects of item non^aresponse. * Or, NCES might 
wish to repeat Blau and Duncan's analysis, using especially troublesome 
NLS-HS data, This preliminary step might at the same time provide a 
data base on which alternative methods of data assignment could be tried, 
to assess consequences of their use, 

* Provided, of course* that not only the rate Of item non-response but 
also the relevant characteristics of non-respondents (as best known) 
were re-created in the experiment suggested*' Cases would be sampled 
within characteristics -defined subcategories to create the hypothetical 
M item non -respondents, ,f 

120 



113 



A possible method of reducing dependence on "expert judg - 
ment" as a basis fo r assignments. Procedures such as the A ,I.D. pro- 
gram help reduce the bounds within which judgment must be exercised, 
hut even, this -rather sophisticated approach does not replace judgment. 
Although we have given some thought to the matter, we have not conceived 
of any method which would allow complete elimination of judgment by 
empirical evidence , 

A definitive study would require knowledge of the reasons for 
Item non-response and an assessment of the relationship of each of the 
sevreral reasons with the "true" values of the missing data. Given the 
data presently available, such an analysis cannot be performed. Were 
there sufficient concern to warrant the expense of special follow-up studies 
reasons for non-response might be obtained from Item non- respondents 
to future waves* of the NLS-HS survey. 

For certain items, interest in improving accuracy of the data 
Mm might justify such an effort, "Reasons why" information, however, 
is notoriously subject to various forms of distortion owing to factors such 
as socially acceptable response, rationalization, or creation of artificial 
justifications, * Given such problems, plus recall error, the value of 
follow-ups aimed at discovering motives for item non-response seems 

* That. is, the respondent makes up something to satisfy the inquiry, even, 
though there was not- -or he cannot specify- -any particular reason for 
ton- response at the time it occurred, 

n -121 



114 



doubtful * 

It might, however, be worthwhile to make reasonable esti- 
mates of the motives underlying failure to respond and, from such 
estimates, to narrow the scope of required judgments, The approach 
sketched below would be time-consuming and costly; therefore, consider- 
ation of its use should be limited to items which are of critical importance 
and have unacceptably high rates of item non-response. 

We assume, as the basis for this approach, that different 
reasons for item non-response will be associated with varying ff true M 
values for the variable in question. That Is, we suppose that those non- 
respondents who intentionally conceal data will tend to differ from those 
whose non-response is the result of error in following skip patterns, etc, 
We suggest that non-respondents can be described in terms of certain 
patterns of response to the total questionnaire, or to several follow-ups, 
as well as in terms of their personal or contextual characteristics, 

Mayeske et aL have shown that the mean criterion scores for 
non- respondents (undifferentiated as to motive) tend to differ from those 
for specified categories of respondents * Our suggestion takes this evi- 
dence as the basis for the assumption that differently motivated non- 
response will likewise exhibit differences in item values, The problem, 
of course, is that for the item in question values are not available for 
non-respondents. Thus, some way of estimating appropriate values for 
each type of non-response must be found* Our suggestion is a multi-stage 



"122 



115 



procedure which might yield fairly refined estimates . It is a variant on 
the standard method of assigning means or other measures of central 
tendency , 

The THAID algorithm used by the University of Michigan 
Institute for Social Research forms the basis of the method , This pro- 
cedure* locates those variables, in a set of candidates, which maximize 
the difference in distributions of cases over a set of categories for a 
criterion variable. By iterations, the program yields information about 
which candidate p *adictors are the most powerful (in terms of differentiat- 
ing the distributions) and what values of each predictor are associated 
with varying distributions. The program seems uniquely suited to analysis 
of item non-response in terms of estimated motive, as discussed below. 
It enters into the overall procedure in the final stages. 

The steps necessary to the procedure are; 
1. Classification. If we assume that item non-response can 
stem from any of the several sources like those listed below, the first 
step would be classification of each item non-respondent into one of sev- 
eral categories, on the basis of an edit of the questionnaire: 

a. Administrative error - questionnaires with missing pages, illeg- 
ibly printed pages or items, or the like, which can account for 
item non -response. 

* A brief description of the program is given in Morgan, et al . (1974) 
and a detailed account appears in Morgan and Messenger (TP73). 



«? '123 



116 



b. Respondent error - indicated by evidence of respondent difficulty 
in following the questionnaire, such as frequent routing errors, 
failures to follow item instructions, many inconsistent responses, 
and the like, 

c. Respondent lack of information or indecision - indicated by pat- 
terns of response which suggest that, though cooperative, the 
respondent is unable to provide specific information. Such pat- 
terns might include frequent use of "don f t know," "undecided, " 
multiple responses, and the like, 

d. Respondent deviance - indicated by patterns of response which 
suggest that response options provided are inadequate to the 
peculiar situation of the respondent, such as frequent use of un- 
codable or "other" responses, 

e. Limited time/patience - patterns which indicate that the respon- 
dent simply quit responding, after having done so at the outset: 
all item non -response concentrated in "blocked" portions of the 
questionnaire, with complete and consistent responses in other 
portions . 

f . Intent to mislead - patterns of response which suggest that the 
respondent intended to mislead or simply confound the analyst, 
such as frequent "out of range" responses, highly unlikely single 
responses or combinations, including face sheet items (e*g, , 
Puerto Rican ethnicity and Shinto religion), and the like. 



117 



g. Intent to conceal - patterns of omission which ff flag ,! certain 
items in a block of related items as intentionally omitted to con- 
ceal; for example, failure to supply income in the midst of com- 
pleted answers on other employment or standard-of-iiving items, 
especially if no other motives are suggested, 

h. Mixed motives - presence of indications that item omission is 
probably part of more than one pattern of motivation, such as 
both error and lack of information or indecision, 

L Indeterminate - a residual category covering cases for which no 
clear patterns are found . 

Interviewer notes and comments might be used to supplement 
study of response patterns in the categorization of item non- response mo- 
tives , 

2 . Selection of candidate predictor variables . Indiscriminate 
inclusion of all available variables among the THAID candidates would be 
inefficient. Since the final step involves relating subcategories to item 
values, only those variables which are highly correlated with the item 
under consideration should be included among the candidates. Thus, the 
second step of the procedure calls for an examination of correlation ma- 
trices, * derived from the item respondents, to determine which variables 

% We use the term somewhat loosely. The matrix would have to contain 
a mix of various measures of association, not necessarily the Pearson 
f r f often suggested by ''correlation, " 

125 



118 



have a hign zero order correlation with the criterion (motive). Candi- 
date variables should have this property and their intercorrelations should 
be relatively low. From a large number of variables, perhaps as many 
as 15 might be selected for final inclusion* It is to be emphasized that 
the selection procedure is wholly empirical --there need be no interpret- 
able "reason" for high correlation between the criterion item and a candi- 
date variable, since the objective is confined to prediction, 

3, The candidate variables are entered as predictors in the 
THAID program, with non-response motive as the categorical dependent 
variable, The program selects combinations which best discriminate 
distributions, thus yielding "best estimates" of characteristics associated 
with membership in a motive category, 

4, For each category of motivation, the combinations of 
characteristics yielded by THAID can be utilized to identify a subset of 
item respondents, for which a summary statistic- -mean, median, mode- 
can be computed, 

The chief advantage we see in such an approach is that it 
seeks to take account of motives for non -response as a variable plausibly 
associated with criterion item values. It departs from traditional ways 
of assigning means only by considering information besides the customary 
background characteristics of the item non-respondent as a basis for 
matching him to some subset of respondents, Where such motives as in- 
tent to mislead or to conceal underlie item non-response, there is good 



119 



a priori reason to suppose that some peculiarity in the respondent's situ- 
ation, directly affecting his true item value, has induced the omission 
of the item. Likewise, respondent error and lack of information may 
reflect personality or ability factors which, in turn, may bear upon the 
respondent's experience and his standing relative to otherwise similar 
respondents , We would, of course, like to be able to spot deviant cases, 
those respondents whose circumstances depart so far from the norm that 
preceded response options are inadequate, For such reasons, there seems 
justification --for crucial and high non-response items --to undertake some 
effort like the one suggested. 

Closing comment . The "state of the art" of adjusting data 
for item non-response appears primitive, despite the existence of some 
rather sophisticated techniques . What we have found wanting are not pro- 
cedures for manipulating data, but rather statements of the logical under- 
pinnings and accompanying empirical evidence of the consequences of 
data assignment. At present, each researcher seems on his own except 
for traditional- -but not well examined- -treatments . 

It is especially unfortunate that what efforts have been made 
appear to focus chiefly on adjusting distributions to compensate for errors 
in static population description induced by missing data. The potentially 
more important matter of adjusting individual records, for longitudinal 
analysis of processes, seems almost unexamined. 

We think NOES or its contractors would make a significant 




120 



contribution to both the value of the NLS-HS data and the state of the art 
of longitudinal analysis by such methodological studies as those we have 
sketched. Hence, our strong recommendation that such efforts be under- 
taken, 

POSTSCRIPT 

This paper in draft form has stimulated discussion of the 
problem of item non- response and data quality among present and pro- 
spective users and the governmental and private organizations responsible 
for the NLS surveys. 

Those discussions have generated some points of agreement 
as well as some controversy, All participants appear to accept the 
critique of the questionnaire as too detailed and too complex for a mail- 
out survey. Yet there seems little possibility that any major improve- 
ment can be made for the third follow-up and, we are told, it is likely 
that the questionnaire will be even more difficult In that wave, because 
Federal agencies with an interest in the cohort sampled have succeeded 
in adding items to the survey. None have been willing to delete any of 
the details sought in the first follow-up. Whether this survey can bear 
the burden of gathering so much disparate information remains to be 
seen, We have doubts * even though the contractor has planned for tele- 
phone call-backs, to obtain critical information, for about half the sample 
respondents , 

Some of the difficulties cited in this paper have been corrected 

128 



121 



retroactively, and soma modifications in the questionnaire graphics have 
been made. Conditional item responses now are coded, where required, 
to include a flag for Inconsistency with the "parent" routing item. In the 
second and later questionnaires, SKIP instructions have been reworded 
to "GO TO" and printed in red, 

Despite these useful modifications, many coding and format 
problems remain. The survey contractor is considering our suggestions 
on coding and formatting, but will be unable to test any of the latter for 
possible use in the third follow-up because necessary instrument approval 
and logistical preparations cannot be changed so shortly before field 
pretests are to be conducted. 

As might be expected, considerable controversy has been 
raised about suggestions concerning preparation of an analysis -oriented 
data file, especially on the possibility of including assigned values for 
missing data. The National Center for Education Statistics, the responsi- 
ble Federal agency, opposes the assignment suggestion on grounds like 
those given by Parnes, et al. (see text, p. 88) and on grounds of cost. 
Others join NCES in arguing that the "state of the art" provides no gen- 
erally accepted method for estimating the values to be assigned (a point 
we stress in the text), One participant opposes the suggestion because 
researchers with differing problems may wish to use methods other than 
those which might be adopted for creation of the analysis file. 

Some comment on these objections is warranted. We stress 

129 



122 



repeatedly that the documentary file should be retained to accommodate 
researchers who wish to devise their own methods, and it is evident that 
an assignment "flag" code would permit such researchers to ignore 
assigned values in a file. Our objections to the Fames position are 
given in the text, but we add that Parnes 1 position was taken with refer- 
ence to a data base which differs in important respects from the one under 
discussion. The Parnes base has Item non-response rates much lower 
than those cited here (rarely exceeding 10 per cent) and its data were 
obtained chiefly by personal interviews conducted by Census-trained 
personnel. Under such circumstances, the policy on missing data might 
well differ considerably from what is appropriate for the surveys of the 
Class of 1972, 

The most cogent objections to data assignment are those based 
on '"State of the art" and cost, The concluding portion of our paper dis- 
cusses critically the assignment of missing data, and recommends a 
program of methodological studies intended to investigate whether any 
method of data assignment will markedly affect population estimates and, 
if so, which method seems most appropriate for this data base. 

Such a program would be costly, and its results might not 
yield assigned values acceptable to all users. Nonetheless, we still 
assert that some effort to "fill in" missing data is highly desirable for 
longitudinal analysis, so long as the estimation/assignment procedure 
chosen provides well-grounded and clearly flagged values, 

130 



123 



We have pointed out in discussions that a decision to omit 
assigned values has serious cost implications for users, some of whom 
will perform their own (possibly duplicative) adjustments of the data. 
Some analyses may be foregone because otherwise competent analysts 
lack data processing facilities or skills to modify the data. Some mis- 
leading policy "information" may flow from. analyses based on that self- 
selected portion of the sample which responded to a particular item or 
set of items . 

Against the background of a study which reportedly has cost 
upwards of five million dollars thus far, the expenditure of time and funds 
to assess methods of data assignment seems well justified. The benefits 
flowing from these costs would be a data base accessible to a wide variety 
of potential users, some assurance that information based on the NLS 
data is grounded on the best estimates that current survey .methodology can 
provide, and a substantial .contribution to the "state of the art" of longi- 
tudinal survey analysis . 

Clearly, NCES should not offer only a data file bearing assigned 
values . Neither should it provide assigned values or a manual for making 
assignments without first pursuing the necessary methodological studies 
upon which to ground its recommendations. Although we have been auda- 
cious enough to recommend one particular method among those we reviewed, 
we urge the NCES launch its own investigations and draw others into the 
discussion. We hope that many interested parties will volunteer empirical 
evidence and/or opinions, so that the debate can be intensified. 

131^ 



124 



WORKS CITED 



Blau, P. M. and 0. D. Duncan. 
1967 The American Occupational Structure, New York: Wiley, 

Bureau of the Census . 

1973 Census of Population: 1970; Vol, 1, Characteristics of the 
Population, fart 1* United States Summary- -Sec, 2, ~~~ 
Washington, D, C. : Government Printing Office, 

Bureau of the Census . 

1974 indexes to Survey Methodology Literature, Technical Paper 
No. 34. Washington, D. C: Government Printing Office, 

Center for Human Resource Research, 

1975 The National Longitudinal Surveys Handbook , Columbus, Ohio: 
CHRR, The Ohio State University, 

Claudy, J. G. (ed.) 
1972 The Project TALENT Data Bank: A Handb ook, Palo Alto, 
Calif,: American Institutes for Research. 

Coleman, J. S. f E. Q. Campbell, C. J. Hobson, J, MePartland, A.M. 
Mood, F. D. Weinfeld, R. L. York, 
1966 Equality of Educational Opportunity; PHEW Pub, No. QE- 
38001 , Washington, D, C . : Government Printing Office, 

Finlayson, S. (ed. ) 
1972 A Panel S tudy of Income Dynamics: Study Design, Procedures, 
Availg tiIe"Data--1968-1972 Interviewing Year s (Waves HV)j_ 
Volume L Ann Arbor, Mich, : Institute for Social Research, 
The University of Michigan, 

Flanagan, J, C 0 M. F. Shay croft, J. M, Richards, Jr. , and J, G, Claudy, 

1971 Five Years After High School Palo Alto, Calif,: American 
Institutes for Research and the University of Pittsburgh. 

Mayeske, G, W,, C, E. Wisler, A, E, Beaton, Jr., F, D, Weinfeld, 
W, M. Cohen, T. Okada, J. M. Proshek, and K, A, Tabler, 

1972 A Study of Our Nation's Schools; PHEW Pub, No. (OE) 72-142. 
Washington, D. C: Government Printing Office, " 



132 



125 



Morgan, J. N. and R. Messenger, 

1973 THAIDi A Sequential Analysis Program for the Analysis of 
Nominal Sca le Dependent Variables. Ann Arbor, Mich, : 
Institute for Social Research, The University of Michigan. 

Morgan, J. N. , K . Dickinson, J. Dickinson, J, Bonus, and G. Duncan. 

1974 Fi ve Thousand American Families --Patterns of Economic 
Pro gress. Vol. 1, An Analysis of the First Five Years of 
t he Panel Study of Income Dynamics . Ann Arbor, Mich.: 
Institute for Social Research, The University of Michigan, 

Nie, N. H. , C. H. Hull, J, G. Jenkins, K. Steinbrenner, D. H, Bont, 
2975 Statistical Package for the Social Sciences, Second Edition . 
New York: McGraw-Hill, 

Fames, H, S., R. C. Miljus, R. S. Spitz, and Associates, 
1970, Ca reer Thresholds; A Longitudinal Study of the Educational 
1971* and Labor Market Experience of Male Youth (3 Vols.); U.S." 

Department of' Labor Manpower Research Monograph No, lo. 

Washington, D. C . : Government Printing Office. 

Research Triangle Institute, 

1975 National Longitudinal Study of the High School Class of 1972- 
B ase- Year and First Follow-up Data File User's Manual 
(Preliminary) . Research Triangle Park, N. C.: RTI 

Shea, J. R., R, D. Roderick, F . A, Zeiler, A. I. Kohen, and Associates, 
1971 Yea rs for Decision: A Longitudinal Study of the Educational 
and Labor Mar ket Experience of Young Women (Vol. I); U.S. 
DepartmenroT~Labor Manpower Research Monograph No. 24 , 
Washington, D.C, : Government Printing Office, 

Sonquist, J. A,, E. L. Baker, and J. N. Morgan. 
1973 Sea rching for Structure (2nd Edition) . Ann Arbor, Mich. : 
Institute for Social Research, The university of Michigan, 

Tillery, D, and T. Kildegard. 
1973 Ed ucational Goals , Attitudes, and Behaviors i A Comparat ive 
Stuly" of High School Seniors ^ Cambridge, Mass, : Balllnger 
Publishing. 



133 

ERIC 



126 



TABLE 2 

BASIC ITEM CONTENT, BY RATE OF USABLE 
RESPONSE AND NUMERICAL SEQUENCE 



95 to 100 Per Cent 



Item 

Number Paraphrased Content 

F21 Any training program after high school? 

F48A - Was respondent working in October 1973? 

F58A Number of weeks worked* October 1972 to October 1973 

*BY2 Type of high school program 

*BY5 High school grades 

*BY8 Average weekly hours worked during high school 

*BY83 Any work-limiting physical handicap? 

*BY84 Respondent's race or ethnic group 

*BY92 Respondent's religion 

*BY94A) 

to ) Parental home possessions 
*BY94K) 

*BY95 Base year residence area type and size 

90 but less than 95 Per Cent 

FlA Present activity: working 

FlC Present activity: taking academic courses at a college 

F4 With whom living, October 1973 

F5 Kind of dwelling, October 1973 

F6A October 1973 residence area type and size 

F6B Distance, October 1973 residence from base year residence 

F9 Was respondent financially dependent in October 1973? 

F10 Number others financially dependent on respondent, 

October 1973 

F12 Schooling aspirations 

F14 Schooling expectations 

P16A Expected activity, October 1974: working 

F19 Expected occupation, at age 30 



See notes at end of Table 2, p, 132. 

134 



127 



TABLE 2 (Cont'd) 

BASIC ITEM CONTENT, BY RATE OF USABLE 
RESPONSE AND NUMERICAL SEQUENCE 



Item 

Number Paraphrased Content 



F25 Was respondent taking courses at any school, first week 

of October 1973? 
F49A Kind of job held, October 1973 

F49G- Currently working in this job? 

F54A Was respondent working in October 1972? 

F58C Number of employers , period October 1972 to October 1973 

F78A^ Father's education 

F78B^ Mother's education 

F80A Did mother work when respondent was in high school? 

F81 Did respondent apply for college admission before October 

1973? 

85 but less than 90 Per Cent 

FIB Present activity: taking vocational or technical courses 

FID Present activity: on active duty in Armed Forces or in 

service academy 
FIE Present activity: homemaker 

FlF Present activity: unemployed 

F7A Marital status, first week of October 1973 

FllB 4 Spouse's total 1973 income 

F13A Amount willing to borrow for schooling 

F16B Expected activity, October 1974: taking vocational or 

technical courses 

plfjC Expected activity, October 1974: taking academic courses 

in college 

F22AA Type training program since high school: on-job training 

F22C How long did training program last? 

F22D Has respondent completed training program? 

F22E Has respondent used training on any job? 

F23^ Has respondent attended any kind of school since leaving 

high school? 



See notes at end of Table 2, p. 132, 

135 



128 



TABLE 2 (Cont'd) 

BASIC ITEM CONTENT, BY RATE OF USABLE 
RESPONSE AND NUMERICAL SEQUENCE 



Item 

Number _____ Paraphrased Content 



F29A Was responoent taking courses at any school during 

October 1972? 

F39 Has respondent attended any other school since high 

school? 

F42 Was respondent working toward any degree, certificate, 

or license, first week of October 1973? 

F43 Since leaving high school and before October 1973, has 

respondent earned any certificate! license, diploma, 
or degree? 

F48C Was respondent looking for work, September 1973? 

F54C Did respondent look for work, October 1972? 

F55A Kind of job held, October 1972 

F58B Number of weeks unemployed, period October 1972 to 

October 1973 
F79 Father 1 s occupation 

F80B Did mother work when respondent was in grade school? 

F80C Did mother work before respondent was in grade school? 

BY90B 3 Mother* s education 

80 but less than 85 Per Cent 

F2 Did respondent complete high school? 

F3A Month left last high school 

FSB Year left last high school 

F13B Did anyone discuss borrowing for schooling? 

F16D Expected activity, October 1974: active duty in Armed 

Forces 

F16E Expected activity, October 1974: ho me maker 

F22B Kind of work trained for, in post-high school training 

program 

F24P Reason for not continuing education: earn own money 

F41B Did any school attended give credits? 



See notes at end of Table 2, p. 132, 



136 



ERLC 



129 



TABLE 2 (Cont'd) 

BASIC ITEM CONTENT, BY RATE OF USABLE 
RESPONSE AND NUMERICAL SEQUENCE 



Item 

Number Paraphrased Content 



P50A Average weekly hours worked, job held October 1973 

P50B Average weekly earnings, job held October 1973 

F56B Average weekly earnings, job held October 1972 

F82B Admitted to school applied to before October 1973? 

F82C Request financial aid, school applied to before October 1973? 

BY90A^ Father's education. 



75 but less than 80 Per Cent 



F24A) 
to ) 

F240 ) Various reasons for not continuing education after high 
and ) high school 

F24Q) 

F28A Field of study (major), October 1973 

P30/ Did respondent attend the same school in October 1972 

and October 1973? 

F33B Classified as full-time student, October 1972 

F33C Number of class hours per week, October 1972 

F34 Was field of study the same in October 1972 and October 

1973? ' ■ 

F56A Average weekly hours worked, job held October 1972 

F83AA No second-choice school applied to before October 1973 

*BY93 Parents' income in base year 

70 but less than 75 Per Cent 

pllA Respondent's total 1973 income 

F22AB { Various training programs in which respondent participated 
F22AH) after hlgh sch001 and before October 1973 

F26B Kind of school attended, October 1973 



See notes at end of Table 2, p. 132, 

137 



130 



TABLE 2 (Cont'd) 

BASIC ITEM CONTENT, BY RATE OF USABLE 
RESPONSE AND NUMERICAL SEQUENCE 

Item 

Number Paraphrased Content 



F26C School attended October 1973 public or private? 

F27AA Month first attended school of October 1973 

F27AB Year first attended school of October 1973 

F27B Classified as full-time student, October 1973? 

F27C Number of class hours per week* October 1973 

F27D Classified as freshman or sophomore, October 1973? 

F28B 8 Field of study October 1973 academic or vocational? 

F28C How long to complete program (major) enrolled in as of 

October 1973? 

F46AA Total cost of schooling, first year after high school 

65 but less than 70 Per Cent 
FIG Present activity: other 

F46AB Number of months to spend total cost of schooling, first 

year after high school 
F47AA First (listed) source, money for schooling first year after 

high school 

60 but less than 65 Per Cent 

F13C Was there any change in borrowing plans? 

F16F Expected activity, October 1974: other 

F32C School attended October 1972 public or private? 

F37 Did respondent drop out of school attended in October 1972? 

F47AB Amount of schooling money from first-listed source, first 

year after high school 
F83B Was respondent accepted by second-choice school applied 

to before October 1973? 
F83C Request financial aid, second-choice school applied to 

before October 1973 



See notes at end of Table 2, p s 132, 



138 



131 



TABLE 2 (Cont'd) 

BASIC ITEM CONTENT, BY RATE OF USABLE 
RESPONSE AND NUMERICAL SEQUENCE 



Less than 60 Per Cent 



Item 

Number Paraphrased Content 



FllC Respondent's wage and salary income, 1973 

FllD Spouse's wage and salary income, 1973 

FllE Respondent's scholarship income, 1973 

FllF Spouse's scholarship income, 1973 

FUG Respondent's miscellaneous income, 1973 

FllH Spouse's miscellaneous income, 1973 

F29BA ) Various reasons for not continuing education right after 

to ) high school (by October 1972) 

F29BR ) 

F31A ) Various reasons for changing schools between October 

to ) 1972 and October 1973 

F31K ) 

F32B Kind of school attended, October 1972 

F40B Kind of other school attended, anytime after high school 

F40DA Is respondent currently attending this other school? 

F41CB Number of semester credits accrued by October 1973 

F41CC Number of other type credits accrued by October 1973 

P46BA Expenditures for tuition and fees, first year after high 

school 

F46BB Expenditure for room and board, first year after high 

school 

F46BC Expenditure for books and supplies, first year after high 

school 

F46BD Expenditure for transportation, first year after high school 

F46BE Expenditure for miscellaneous school-related items , first 

year after high school 
F47BA Second source of schooling money, first year after high 

school 

F47BB Amount from second listed source, first year after high 

school 

F47CA Third source 



See notes at end of Table 2, p, 132, 



139 



132 



TABLE 2 (Cont'd) 

BASIC ITEM CONTENT, BY RATE OF USABLE 
RESPONSE AND NUMERICAL SEQUENCE 



Item 






rdrapurabcu ^uihgiil 


F47CB 


Amniinf from fhiT*d snut^n^ 

ijll iUUUt ii will L.11J.JL \JL SUUJL wS 


F47DA 


JL til O WUt J, wW 


F47DB 


Amount from fourth source 


F47EA 


Fifth source 


F47EB 


Amount from fifth source 


F47FA 


Sixth source 


F47FB 


Amount from sixth source 


F47GA 


Seventh source 


F47GB 


Amount from seventh source 


F82DA 


Amount of scholarship aid offered, first choice school 




applied to before October 1973 


F82DB 


Amount of loan aid offered, first choice school 


F82DC 


Amount of promised job aid offered, first choice school 


F83DA 


Amount of scholarship aid offered, second choice school 


F83DB 


Amount of loan aid offered second choice school 


F83DC 


Amount of promised job aid offered, second choice school 



NOTES : Item content is paraphrased from the wording of the First Follow - 
Up Questionnaire, Item numbers are those employed for the 
response distribution published in the User's Manual* 

Items prefaced by *BY are background variables for which data 
was collected from 4,539 individuals via Form B of the First 
Follow -Up Questionnaire, Data for these cases are included in 
the published distributions for Base Year Questionnaire variables , 
Response rates for *BY items thus are based chiefly on data col- 
lected In the Base Year administration and are not entirely com- 
parable to those for items collected exclusively in the first follow- 
up survey, 

i 

Rate excluding routing- error coded responses is 95*0 per cent; 
including error-coded responses, rate is 99,3 per cent, 

* Rate excluding routing-error coded responses is 91*4 per cent; 
including error -coded responses, rate is 99,1 per cent, 

140 

ERIC 



133 



TABLE 2 (Cont'd) 

BASIC ITEM CONTENT, BY RATE OF USABLE 
RESPONSE AND NUMERICAL SEQUENCE 



Items BY90A (Father's education) and BY90B (Mother's educa- 
tion) are not starred, and are based on data supplied only via 
the Base Year Questionnaire, They overlap items F78A and 
F78B (Father's and Mother's education), obtained from all re- 
spondents via the First Follow-up Questionnaire, The two 
items (BY90 and F78) employ different response categories, 
and response rates are based on different sample sizes (16,683 
and 21,350, respectively). 

Estimated rate. Published rate =16.5 per cent, owing to over- 
sized eligible base. Discussed in Sec. 1 of the paper. 

Rate excluding routing-error coded responses is 87.5 per cent. 
If error-coded responses are included, rate is 99.7 per cent. 

Rate excluding routing-error coded responses is 85.5 per cent. 
If error -coded responses are included, rate is 91.6 per cent. 

Rate excluding routing-error coded responses is 77.8 per cent; 
including error-coded responses, rate is 83.4 per cent. 

Rate excluding routing -error coded responses is 70.9 per cent- 
including error-coded responses, rate is 75.4 per cent. 



National Longitudinal Study of the High School Class of 1972: 
Base-Year and First Follow-up Data File User's Manual 
(Preliminary). Research Triangle Park, N.C. : Research 
Triangle Institute, April 1975. 



141 



TABLE 3 



RESPONSE DISTRIBUTIONS, SELECTED FIRST 
FOLLOW-UP AND BASE-YEAR ITEMS 



Unusable Responses 



Non-Response 





Number 


Routing-Error 


"Garbage" 


PARTIAL 




Item 


Eligible 


Usable Codes 


Codes 1 


RESPONSE 


BLANK 


Number 


to Answer 


Responses 20 40 1 


(94 to 97) 


(93) 


(98) 






(1 1 1 1 


UP 


w 


(%) 


F1A 


21,350 


94,6 


* 


5,2 


0,1 


FIB 


21,350 


87*3 




12,6 


A 1 

0,1 


F1C 


21,350 


90,0 


0,1 


9.8 


0.1 


FID 


21,350 


86.6 


* 


13.3 


0.1 


FIE 


21,350 


86,2 




13.7 


0.1 


F1F 


21,350 


86.4 


* 


13,5 


0.1 


FIG 


21,350 


65,7 




34.1 


0.1 


F2 


21,350 


80,2 * 8,1 


* 




11,6 


F3A 


21,312 


80,8 
80,9 


* 

0,1 




19,1 
19,0 


FSB 
F4 


21,312 
21,350 


93,2 


0,3 




6.5 


F5 


21,350 


94.0 


0,1 




5,9 


F6A 


21,350 


92,7 


0,3 




6.9 


F6B 


21,350 


94,5 


0.1 




5.4 


F7A 


21,350 


87-1 1.2 0.5 


0.1 




11,1 


F7B 


6,073 


55.6 


0.2 




44,2 


F7C 


6,073 


54,9 


0,2 




44.9 


F8A 


6,073 


56*8 0,9 0.6 






41.7 


See notes at end of Tai 


>le 3, pp, 145-147, 









LEGITSKIP 

(99) 
(number) 

1,048 
1,048 
1,048 
1,048 
1,048 
1,048 
1,048 
1,048 
1,086 
1,086 
1,048 
1,048 
1,048 
1,048 
1,048 
16,325 
16,325 
16,325 



w 
A. 



142 



ERIC 



TABLE 3 (Cont'd) 



RESPONSE DISTRIBUTIONS, SELECTED FIRST 
FOLLOW-UP AND BASE- YEAR ITEMS 



Item 
Number 



F8B 

F9 

F10 

F11A 

FllB 

W 
F11C 

(b) 



IE 



w 

Fil 

w 

Fll 

W 



(b) 
FUH 

« 
F12 



ser 
Eligible 
to Answer 

w 

3,739 
21,350 
21,350 
^1,350 
21,350 
(4,050) 
21,350 
(20,496) 
21,350 
(4,050) 
21,350 
(20,496) 
21,350 
(4,050) 
21,350 
(20,496) 
21,350 
(4,050) 
21,350 



Unusable Res ponses 

" "Gartage" 



sable 
Responses 

27.7 
93.4 
94.2 
72.0 
16.5 
(86.9) 
46.0 

(47,9) 
10.8 
(57.2) 
26.0 

(27.1) 
7.6 

(40.1) 
23.0 
(23.9) 

7.7 
(40.6) 

93.7 



Routing-Srror 

Codes 
20 40 60 

III 



Non-Response 



Codes 1 RESPONSE BLANK LEGITSKIP 



4 to 97) (! 


?3) (98) 


(ii) 








0.4 


71.9 


18,659 


0.1 


6,5 


1,048 




5.7 


l,u4o 


1.7 


26.3 


l,(J4o 


1.1 


Art J 

82,4 


1,048 


(4.6) 


/ft ^\ 

(8,5) 


(18,348) 


1.1 


52.9 


1,048 


(1-2) 


(50,9) 


(1,902) 


0.8 


do o 
00,0 

/On 7\ 


i niQ 
/ifi wfi\ 


(3.1) 
0.7 


(39.7) 
73.2 


1,048 


(0.8) 


(72,1) 


(1,902) 


0.6 


91.8 


1,048 


(3.3) 


(56.6) 


(18,348) 


0.8 


76.2 


1,048 


(0.8) 


(75.3) 


(1,902) 


0.7 


91.6 


1,048 


(3.5) 


(55.9) 


(18,348) 


0.3 


6.1 


1,048 



See notes, at end of Table 3, pp. 145-147, 



iWLH t (tone aj 

RESPONSE DISTRIBUTIONS, SELECTED FIRST 
FOLLOW-UP AND BASE- YEAR ITEMS 



item Eligible ■ Usable 
Number to Answer Respons 






% 


F13A 


21 f 350 


89,6 
83,2 


F13B 
FloC 


21,350 
21,350 


92,5 


F14 
F16A 


21,350 


92,8 


F16B 


21,350 


85,2 


F16C 


21,350 


88,9 


F16D 


21,350 


84.4 


F16E 


21,350 


84.8 


F16F 


21,350 


63.5 


F19 


21,350 


91.0 


F21 


21,350 
4,891 


98.0 
86.9 


F22AA 
F22AB 


4,891 


73.8 


F22AC 
F22AD 


'4,891 
4,891 


73,7 
72.9 


F22AE 
F22AF 


4,891 
4,891' 


73.5 
72.6 



Unusable Responses 
Routing-Error "Garbage 11 

Codes Codes 3 
20 40 60 (94 to 97) 



Non-Response 
PARTIAL 



0.1 0.9 



1,3 0,3 



0.7 

* 

0,3 
0,3 
0.2 
0.3 
0.3 
0,2 
0.2 
0,4 

2.6 

* 

0,2 

* 

t 
t 
* 



RESPONSE 
(93) 



6.6 
14,3 
10.5 
13,1 
14.7 
35.9 



8,5 
21,7 
21.8 
22,6 
22,0 
22.9 



BLANK 
)8) 



9.7 
13.2 
37.3 



0.3 
0,3 
0,3 
0.3 
0,3 
0.2 
6.4 
0,5 
4.4 
4.5 
4,5 
4,5 
4.5 
4.5 



LEGITSKIP 

(nurrBer) 

1,048 
1,048 
14,879 
8 
18 

1,048 
1,048 
1,048 
1,048 
1,048 
1,048 
1,048 
17,507 
17,507 
17,507 
17,507 
17,507 
17,507 



N 



See notes at end of Table 3, pp. 145-147, 



ERIC 



TABLE 3 (Cont'd) 



RESPONSE DISTRIBUTIONS, SELECTED FIRST 
FOLLOW-UP AND BASE- YEAR ITEMS 



Item 
Number 



F22AG 

F22AH 

F22AI 

F22B 

F22C 

F22D 

F22E 

F23 

F24A 

F24B 

F24C 



F24E 
F24F 
F24G 
F24H 
F24I 
F24T 



Number 
Eligible 
to Answer 

W 

4,891 
4,891 
4,891 
4,874 
4,891 
4,891 

4,891 
21,350 
8,118 
8,118 
8,118 

8,118 
8,118 
8,118 
8,118 
8,118 
8,118 
8,118 



Usable 
Responses 

TP 

73.4 
74,7 
68,1 
83.5 
86.8 
86.6 
87.1 
87,5 
79.8 
79,9 
79.5 
79.4 
79.4 
79.3 
79.3 
79.3 
79.2 
79.3 



Unusable Responses 
Routing-Error "Garbage" 



Non-Response 



Codes 



111 



6.1 3.7 2.4 



Codes a RESPONSE BLANK LEGITSKIP 



to 97) (93) (98) (99) 


(%) 






(number) 


* 


22.0 


4.5 


17,507 


* 


20.7 


4.5 


17,507 


* 


37.3 


4.5 


17,507 


0.6 




15.8 


17,524 


0.4 




12.7 


17,507 


0.6 




12.8 


17,507 


0,1- 




12.8 


17,507 


* 




0.3 


1,048 


0.2 


3.5 


16.5 


14,280 


0.2 


3.4 


16.5 


14,280 


0.2 
0.2 


3.8 
3.9 


16.5 
16.5 


14,280 
14,280 


0.2 


3.9 


16.5 


14,280 


0.2 


4.0 


16.5 


14,280 


0.2 


4.0 


16.5 


14,280 


0.2 


4.1 


16.5 


14,280 


0.2 


4.1 


16.5 


14,280 


0.2 


4.1 


16.5 


14,280 



N 



See notes at end of Table 3, pp. 145-147. 



149 



ERLC 



TABLE 3 (Cont'd) 



RESPONSE DISTRIBUTIONS, SELECTED FIRST 
FOLLOW-UP AND BASE- YEAR ITEMS 



Number 

Item Eligible Usable 
Number to Answer Responsi 




m 


«r 


F24K 


8,118 


79.7 


F24L 
F24M 


8,118 
8,118 


79.6 
79,9 




8 11)1 


7Q n 

/7,U 


F240 


8,118 


79,4 


F24P 


8,118 


30,0 


F24Q 


8,118 


78,8 


F25 


15,903 


90,0 


F26B 


12,177 


73,7 


F26C 


12,177 


73,6 


F27AA 


12,177 


73.3 


F27AB 
F27B 


12,177 
12,177 


73,3 
73,6 


F27C 


12,177 


71,2 


F27D : 
F28A 


12,177 


73.0 
76,6 


F28B 


12,002 
12,177 


70,9 


F28C 


11,825 


72.1 



Unusable Rssponses 
Routing-Error "Garbage 11 

Codes Codes 3 
20 40 60 (94 to 97) 

ill nr 

0,2 
0,2 
0,2 
0.2 
0.2 
0.2 
0,2 

1,4 0,3 * 

0.3 
0,1 
0,2 
0,2 
0,1 
0.8 

- — 0.3- 

0,1 

1,3 3.2 0,3 

0,4 



Non-Response 

Wtial 

RESPONSE BLANK 
(93) (98) 



3.6 
3,7 
3.4 
4.2 
3.9 
3.4 
4.5 



16,5 
16,5 
16,5 
16.5 
16,5 
16,5 
16,5 
8,3 
25,9 
26,2 
26,6 
26.5 
26,3 
28,0 

26.7 

23,3 
24,3 
27,5 



LEGITSKIP 
(99) 



er 



14,280 
14,289 
14,280 
14,280 
14,280 
14,280 
14,280 
6,495 
10,221 
10,221 
10,221 
10,221 
10,221 
10,221 
10,221 

10,396 
10,221 
10,573 



g|ee notes at end of Table 3, pp, 145447, 



TABLE 3 (Cont'd) 



EESPONSE DISTRIBUTIONS, SELECTED FIRST 
AND BASE- YEAR ITEMS 



Unusable Responses 



Item 



F29A 

F29BA 

F29BB 

F29BG 

F29BD 

F29BE 

F29BF 

F29BG 

F29BI 
F29BJ. 
F29BK 
F29BL 
F29BM 

raw 

F29BO 
F29BP 
F29BQ 
F29BE 



Number 




Won- Response 
MflAL 



Eligible 
to Answer 


Usahle Coc 
Responses 20 41 


lee 

3 60 


Codes 
(v4to y/j 


RESrUPibE 

(yd) 




[77 1 _ 


15,903 


"<se * § fi 

85 *5 2,8 1, 


D D 

a i *) 
,V 1.4 


a) ' 




IT 

Q 1 

O ,6 


0,5 70 


5,051 
5,051 


55,6 




0,2 


4.1 


39.5 


17,347 


54,5 




0,2 


5,8 


39.5 


17,w7 


5,051 


53.8 




0,2 


6.4 


39,5 


17,347 


5,051 


53.6 




0,2 


6.7 


39.5 


17,347 


5,051 


Fn 

53.3 






fi Q 


30 ■? 

Q7.Q 




5,051 






ft ? 


7 0 


39 5 


17-347 ^ 


5,051 
5,051 


53,3 
53.3 




ft 1 

a n 

0.2 


7 ft 

*7 A 

7* U 


□9,5 


1*7 3H*7 

1 /p04/ 


5,051 
5,051 
5,051 


53.3 




ft 0 








53.3 
53,8 




0.2 
0.2 


7.0 
6,4 


39,5 
39,5 


17,347 
17,347 


5,051 
5,051 


53.6 
54.2 




0,3 
0.3 


6.6 
6.0 


39,5 
39.5 


17,347 
17,347 


5,051 


53.8 




0.2 


6.5 


39,5 


17,347 


5,051 


53.4 




0.3 


6.8 


39,5 


17,347 


5,051 
5,051 


53.6 




0,2 


6,? 


39.5 


17,347 


54.4 




0.3 


5.7 


39,5 


17,347 


5,051 


53.2 




0.2 


7,0 


39.5 


17,347 



See notes at end of Table 3, pp. 145-147. 



ERIC 



153 



TABLE 3 (Cont'd) 



RESPONSE DISTRIBUTIONS, SELECTED FIRST 
FOLLOW-UP iMD BASE* YEAR ITEMS 





Nuiriher 




Item 


Eligible 


Usable 


Number 


to Answer 


SfiSDons 




WIT 


w 


F3Q 


14,077 


77 8 


F31A 


4,884 


277 


F3LB 


4 884 


27 4 


F31C 


4,884 


27 5 


F31D 


4.884 

^1 wry 4 


27 4 


F31E 


4 S84 


27 3 


F31F 


4 884 


27 3 


F31G 


4.834 


27 2 


F31H 


4.184 


27 6 


P31I 


4.884 


27 4 


F31I 


4,884 


27 2 


F31K 


4 S 884 


27,3 


F32B 


7,438 


59,9 


F32C 


7,438 


60.1 


Bl 


' 14,077 


--78.3 


F33C 


14,077 


75.3 


F34 


14,077 


79,4 


F37 


8,061 


62.6 



Unusable R espons es 
Kouting-Error "Garbage" 

Codes Codes 9 
20 40 60 (94 to 97) 

III T 

3,1 2,5 * 

0.2 
0,2 
0,2 
0.2 
0.2 
0,2 
0.2 
0,2 

0.2 
0,2 
0,2 
0,4 
0,4 
0,2 
1.2 



Non-Response 
PAKTiAL " 

RESPONSE BLANK LEGITSKIP 



W) pi im 



vol 




(numoer) 


U 


10 ,D 

70,7 


8, 321 
17,514 


1.7 


70,7 


17,514 


u 


70,7 


17,514 


1.7 


70,7 


17,514 


1,8 


70.7 


17,514 


1,7 


70.7 


17,514 


1 0 

1,1 


70, 7 


17,514 


1,4 


70. 7 


17,514 


1.7 


70,7 


17,514 
17,514 


1,8 


70,7 


1.8 


70,7 
39,8 
39,5 


17,514 
14,960 


\ 


21.4 
23.5 
20,5 


14,960 
8,321 




37,2 


8,321 
14,337 



See notes at end of Tatye 3, pp. 145-147, 

154 



ERIC 



IS 



TABLE 3 (Cont'd) 



RESPONSE DISTRIBUTIONS, SELECTED FIRST 
FOLLOW-UP AND BASE- YEAK ITEMS 





iNuiuDer 




Item 


Eligible 


Usable 


Number 


to Answer 


Responses 






w 


F39 




50,0 


F40B 


S 001 


iU.Q 




5,221 


31.4 


F41B 


15,903 


81.2 


F41CA 
F41CB 


- omit! 
lo, 74o 




F4111 






F42 




50*0 


F4o 










68.4 


F46AB 
F46BA 


15,903 
15,903 


59.5 


F46BB 


15,903 


40,4 


- F46BG • 


15,903 • 


58.6 


F46BD 


15,903 


46,1 


F46BE 


15,903 


40.9 


F47AA 


15,339 


66,3 



Unusable Responses 
louting-Error "Garbage 11 

Codes Codes 1 
20 40 60 (94 to 97) 

1 I I T 

0,2 
0.1 
0,2 

-see footnote ?f c" — 

1.9 
2,5 

* 

1.9 
2.5 
1,9 
1,9 

— ----- 2.0 

2.6 
2.4 
0.6 



Non-Re sponse 
PARTIAL 

RESPONSE BLANK LEOITSKIF 

(93) (98) (99) 

CD IF (number) 

14,3 6,495 

63.0 17,177 

68.5 17,177 

18.6 6,495 

57.7 8,653 £ 
33,7 8,659 
13.3 6,495 

12.6 6,495 

26.7 6,495 

29.1 6,495 
38,6 6,495 

57.8 6,495 
99:4- 6rf55 — 

51,3 6,495 

56,6 6,495 

33,1 6,509 



f 



See notes at end of Table 3, pp. M5-14' 



TABLE 3 (Cont'd) 



RESPONSE DISTRIBUTIONS, SELECTED FIRST 
FOLLOW-UP AND BASE-YEAB ITEMS 



Unusable Responses 



Number 



Item 


to Answer 






(lUUfg) 


IB 


EMI AD 


10,057 


04 f S 


H/oA 
f ^/JJp 


10 HAS 

12 068 


M 8 

y ? 




fi 4fi7 

fl in? 


JU, / 


r4/iJA 


0,040 


10 0 


fi/iJc 


0,040 


10 k 
17.D 


F47EA 


5,860 


9.1 


F47EB 


5,860 


9.0 


F47FA 


0,454 




F47FB 


5,484 


2.8 


F47GA 


5,355 


0.6 


F47GB 


5,355 


0,0 


F48A 


21,350 


95.0 


- F48C 


— 8 V 072^ 


— S8t6- 


F49A 


14,306 


94.0 


F49G 


14,306 


91.6 


153 

See notes at end of Ta 


ble 3, pp, 1- 



Routing-Error 

Codes 
20 40 60 

III 



1,3 2,8 0,4 



55? P 



Non-Response 



"Gar¥ge 
Codes 1 



RESPONSE BLANK LEG1TSKIP 



(94 to 97) (93) (98) (99) 


(I) n 

2 2 
0 6 


V\ 1 

tJU, I 

43 S 


(number) 
6 509 


2 2 


43 1 


10 W 

lUjUiJU 


n 0 

u,y 


02,5 




1 7 




iq QQ1 




7Q n 


J,J,/J£ g 


1.3 


79.0 


15,752 


1,1 


89.6 


16,538 


1.4 


89.6 


16,538 


1.3 


95,8 


16,914 
16,914 


1.4 


95,8 


1,3 


98,1 


17,043 


1,3 


98,1 


17,043 




0,4 


1,048 




11,3 


14,326 


0,5 


5,6 


8,092 


0,1 


8,2 


8,092 



159 



ERIC 



TABLE 3 (Cont'd) 



RESPONSE DISTRIBUTIONS, SELECTED FIRST 
FfiLlQTO W WM-YEM ITEMS 



Item 
Number 


Eligible 

to Aaswer 


UsaMe 
Responses 




Wo) 


CD 


F50A 


a i HA/ 

14, 306 


on Q 


F50B 


14,oUD 


04, 0 


F54A 
F54C 


21,o!>0 
9, 968 


50,0 


Fd5A 
F56A 
F56B 


7, 983 

Li) /qU 

12,780 


81,0 


F58A 
F58B 


21,350 
21,350 


96,8 
88,7 


F58C 


21,o5U 




F78A 


21,350 


92,3 


F78B 


21,350 


92.9 


F79 


21,350 


87.0 


-F80A — 


. 21,350 


^ 90,1- . 


F80B 


21,350 


89.3 


F80C 


21,350 


88.0 
90,5 


F81 
F82B 


21,350 
11,769 


83,5 



Unusable Responses 
Routing-Error "Garbage 1 
Codes Codes 5 

B 



20 40 



1,7 5.4 0.6 



0.8 0.4 



(94 to 97) 



) 



5.5' 

1.7 

* 

0,1 

0.6 
5,3 s 
1.6 
1.0 
0.7 
0,4 
0.6 
0.4 
2.4 
2.7 
2.8 
3,7 
0,1 
0,4 



d 



Non-Response 
TOTIAL 
RESPONSE 

(93) 



BLANK 
P) 

"IT 



11,7 
13.6 
0.8 
13.3 
13,0 
15.2 
16.7 
2.1 
10,7 
6.6 
7,1 
6.7 
9,7 
7.2 
7.8 
8.4 
8.1 
16.1 



LEGITSKIP 

(99) 
(number) 



8,092 
8,092 
1,048 
12,430 
14,415 
9,618 
9.618 
1,048 
1,048 
1,048 
1,048 
1,048 
1,048 
.1,048 
1,048 
1,048 
1,048 
10,629 



CO 



See aotes at end of Table 3, pp. 145447, 



161 



i aim a (Cont'd) 







RESPONS 


E DISTRIBUTIONS, SELECTED FIRST 












' nW-IP-ANnMR-VEAft HEMS' 




- 




Number 




Unusable Responses Non-Re 
Duting-Error ' "Garbage" FMT1ST 


i 

isponse 




Item 


Eligible 


SB 13 

Usable 


Codes Codes 1 RESPONSE 


BLANK 


LEGITSKIP 


Number 


to Answer 


Responses 2( 


) 40 60' (94 to 97) (93) 


(98) 


(99) 

\77f 




1100%) 




) (I) (I) (I <K) 


' w 


(number) 


F82C 


11,769 


80.8 0, 


4 1.3 0.1 


17,3 


10,629 


F82DA 


4,410 


32,6 


1,8 


65,6 


17,988 


F82DB 


4,410 


31.7 


1.6 


66.7 


17,988 


F82DC 


4,410 


18.3 


1.8 


79.9 


17,988 
10,629 H 


F83AA 


11,769 


75,2 


0,2 


24.6 


F83B 


6,428 


64,0 


0,8 


35,2 


15,970 i 


F83C 


6,428 


64,4 


0,3 


35,3 


15,970 


F83DA 


3,203 


16.4 


1,7 


81,9 


19,195 


F83DB 


3,203 


12.0 


1,3 


86,6 


19,195 


F83DC 


3,203 


7 6 


1 4 


Qi n 

71. U 




BY2 
BY5 


21,222 
21,222 


97.0 
97,8 




3,0 


1,176 


BY8 


21,222 


97,8 




2.2 
2,2 


1,176 
1,176 


BY93 


21,222 


78.1 




2i q 




BW4A- - 
BY94B 


21,222- 
21,222 


95,9 
96,5 


. . .. .. .. ■■ .■. - ■ - 


4,1 


1,176 


BY94C 


21,222 


96.6 




3,5 

3,3 ■ 


1,176 
1,176 
1.176 


BY94D 


21,222 


96.5 




3,5 



163 

See notes at end of Table 3, pp. 145 447. 

162 

ERIC 



TABLE 3 (Cont'd) 

BESPONSE DISTRIBUTIONS, SELECTED FIRST 
FOLLOW-UP AND BASE- YEAR ITEMS 



Item 
Number 



BY94E 

BY94F 

BY94G 

BY94H 

BY94I 

BY94J 

BY94K 

BY84 

BY92 

BY95 

BY83 

BY90Ai 

BY90Bi 



Number 
Eligible 
o Answer 

WT 

21,222 
21 p 222 
21,222 
21,222 
21,222 
21,222 
21,222 
21,222 
21,222 
21,222 
21,222 
16,683 
16,683 



Usable 
Responses 

IP 



96.4 
96.5 
95,8 
95.9 



95.0 
96.2 
98.2 
96,2 
96.3 

96.7 
84.2 
86,2 



Unusable Re sponses 

"Garbage" 
Codes a 



Non-Response 



Routing-Error 
Codes 
1 



20 40 15 
111 



(94 to 97) 



3,2 
2.1 




LANK 



LEGITSKIP 
(99) 



w 




3,6 
3.5 


1,176 
1,176 


4.2 


1,176 


4.1 


1,176 


4,0 
5.0 


1,176 
1,176 


3,8 


1,176 


1,8 
3,8 


1,176 
1,176 


3.7 


1,176 


3,3 
12,6 


1,176 
5,715 



H 



11.8 



5,715 




164 



TABLE 3 (Cont'd) 

RESPONSE DISTRIBUTIONS, SELECTED FIBST 
FOLLOW-UP AND BASE* YEAR ITEMS 



NOTES: "Usable responses" includes all cases tabulated in within-liniits, specific codin| categories , ''Unusable 
(Cont'd) responses" includes all oases which are not tapretable, beyond acceptable value limits, or whose 
validity is questioned owing to routing-pattern errors , "LEGITSK IP " includes cases not expected (not 
eligible) to answer the item, See the discussion of routing-error codes for some qualifications regarding 
the "usable" and "unusaiie' ' designations , 

Items desipated 7"" are from the First Follow-up Questionnaire only, Items desipated "BY-" are 
Jaeic background data for which lataatlott was collected from about BO per cent of the respondents via 
the Base Year Questionnaire, This information was obtained from 4,5S5 respondents via First Follow-up 
Questionnaire, Form B, items 86-99. RTI has merged the latter data with the Base Year data in report- 
ing distributions ( High response rates for "BY-" items are prabaklytoibutalle to the supervised data jj 
collection procedure used with the Ease Year Questionnaire. °* 

a u Gartage ,? codes are "Don't Know" (94), "Out of Range" (95), "Multiple Respse" (96), "Refused Answer" 
(97), plus cases judged outside raasonab.le limits for free-response tiuiierica! items by RTI , 

b Plgures in parentheses represent estimates for the preceding item, based on the revised numler of eli- 
giiles shown, See text p. 31 for discussion of the downward revision for "Spouse 1973 1300018" (items 
PUB, D, F, H) and "Respondent's 1973 income" (FliC, E, G). 

c MCA distribution omitted owing to apparent tabulation error in published data, OUT OF RANGE (cede 
95) is listed with 19,947 cases, 

1 {espouses judged outside reasonable limits by RTI account for most [U per cent) of these "garbage 
itix coded" responses, \a 



ERIC 



TABLE 3 (Cont'd) 



e Responses judged outside reasonable limits by RT1 accoimt for most (4,2 per cent) of these "garbage 
coded" responses, 

f includes "does not #y»" vtt n=515 (2.4 per Dent) for FBOi, n=542 (2.5 per cent) for F80B, and 
^704 (3, 3 per cent) for 

1 Items BY90A and BY90B are (respectively) Fate's and Mote's education, as collected In the Base 
■ Bar Questionnaire only, Included here for comparison with item F78A and F78B, which represent 
the same variables as collected da the First Foto-ip Questionnaire, Categories for the - - ~ 
match exactly, "Garbage code" cases for BTOA and B are "does not apply" responses, 



Source- National ImMM Study of the High School Cass of 1972: Base Year and First 
" " " user's Myuii (Pre! Wary). Research Triangle Park,N.C: Research Triangle Ins- 
L975. 



Follow-up Data File 



e, 



m 0 

ERLC 



IMPACT OF ALLOCATION ON 1970 SCHOOL 
ENROLLMENT DISTRIBUTION 



Lerels 

(School level in which 
currently enrolled) 



Distribution 
fibllllblatr WithilSltion 



Number 

S 



Per Cent 



Nursery School 


. 906:4 


1,6 


Kindergarten 


2,945.7 


5,2 


Elementary School 


31,794.8 


56.3 


Hifli School > 
Collage 


13*974.6 


24,7 


.6,865.8 


12.2 


Total Reported 


^ 56,487.1 


U 100,0 


Total Hot Reports 


id 2,147.9 




Per Cent Allocated 
Average 







(3) 
Number 

posj 

952,8 
3,022,4 
33,210,2 
14,480,6 
6,966.0 



Per Cent 

x 

I, 6 
5,1 
56.5 
24.6 

II, 8 



3.7 Per Cent 



Change 



Proportionate 
Chance 



(5) (« 
(Col. 2 • Col. 4) (Col. 5/Col, 2) 



0 

•0.1 
0.2 

=0.1 



Oft 

0.3% 



.16 



Ujf 



a Base is Total Reported . May not add to 100 due to rounding . 

i 

^ Simple average of entrias s dlsregwding 

Source 8 Bureau of the Census. Census of Population: 1970; Vol, l r Characteristics of the Population, Parti, 

United States Suniniary^Z" fasliinpon: GPQ, 1573, Appendix C» pp, 68-69; ref. OTeC-3 
v (p, l-572) s Table 197 (p. 1405). 



TABLE 5 



■i 

Levels 


IMPACT 
"WEEK! 

Distr 


'OF ALLOCATION ON 1969 
3 WORKED" DISTRIBUTION 

ibution 


Change 


Proportionate 
Change 


IHuniber of weeks 
worked in 1969, 
employed persons 


"Without Allocation 

(1) P) 
Number Per Cent 


With Allocation 

(3) (4) 
Number Per Cent 


(5) 

(CoL 2 -Col, 4) ( 


(6) 

Col, 5/Col. 2) 


aged 16 and over) 




POOa) a/ 






50 - 52 
48 - 49 
40-47 


50,188.1 58.6 
4,978,1 5.8 
7,256,4 8,5 


53,662,0 58,1 
5,397,0 5.8 
7,877,7 8.5 


■0.5 

0 

0 


-0.8% 

Ofo 
% 


27-39 
14 » 26 


7,198.2 8.4 
7,028.4 8.2 


7,851.1 8.5 
7,709.4 8.3 


0.1 
0,1 


1.2% 
1.2% 


13 or less 
Total Reported 


8,981,0 10.5 
85,630,3 100.0 


9,912,8 10,7 
92,410.0 100,0 


0,2 




Total Not Repo 

PerCent Allocate! 
Average 


rted 9,145.8 
i 


7,3 Per Cent 


0.15 b 


0,8% b 


a Base is Total F 


;eported. May not add to 100 1 


due to rounding, 







b Simple average of entries, disregarding sip. 



Source; Same as Table 1, except Census tables C-3, p. 1-573 and 218, p. 1-70! 



TABLE 6 



IMPACT OF ALLOCATION ON 1970 EDUCATIONAL 
ATTAINMENT DISTRIBUTION 



Urate 

(Highest grade com- 
pleted, persons aged 
25 or older) 



(IT 

Numlsr 



Distri bution 

fithMocatff 



cation 



T 

3 er Cent Number Per Cent. 



a/ 



Change 



Proportionate 
Change 



(5) a 

(Col, 2 -Col, 4) (Col, 5/Col. 2) 



None 

Elemei 
1-4 
5 = 6 
7 



High School: 
1-3 
4 

College; 
1-3 
4 

S or more 



1,733.9 


1.7 


1,767,7 

r 

1 i 


1,6 


-0.1 


-5.8% 


3,794,7 
5,542,4 
4,339,0 


J 

Si 


4,271,6 
6,217,1 
4, 815.6 


3,9 
5,7 
4,4 


0.2 
0,2 
0,1 


54 
3.6% 

2.3% 


12,816,5 


12.6 


14, 015 .4 


12,8 


0,2 


1.6% 


19,407,0 
32,138.9 


19,1 
31,7 


21,285,9 
34,153,1 


19,4 
31.1 


0,3 
-0,6 


1.6% 
■11% 


10, 748,2 


10,6 


11,650,7 


10,6 


0 


% 
•1,6% 


6,265,4 


6,2 


6,657,6 


6.1 


-0,1 


4,689,4 


4,6. 


5,059,7 


4,6 


0 


0% 



N 

0 



174 



ERIC 



TABLE 6 (Cont'd) 

IMPACT OF ALLOCATION ON 1970 EDUCATIONAL 
ATTAINMENT DISTRIBUTION . 



(ilighes grade com- 
pleted, persons aged 
25 or older) 



Dist ribution 
""Without Allocation 



location 



er Per Cent Number Per Cent 
ajr* Pi a/ 



Change 



Proportionate 
Ghanee 



(5) (6) 
(Col, 2 - Col. 4) (Col. 5/Col. 2) 



Total Reported 101,475,3 100.0 109,899.4 1 
Total Not Reported §,4244 



H 

01 



Per Cent Allocated 7.7 Per Cent . j 

Average 018 U% 



a Base is Total Reported, May not add to 100 due to rounding, 

/ 

13 Simple average of entries, disregarding sip, 



Source; Same as Tafe ! : crap Census Tables M, p. 1-572 and 199, p. 1-627. 



ERIC 



177 



TABLE 7 



IMPACT OF ALLOCATION ON 1969 FAMILY 
INCOME DISTRIBUTION 



Levels 

(1969 Family income, 
in dollars) 



Distribution 



less than $1,000 
1,000 * 1,999 
2,000 ■ 2,999 
3,000 - 3,999 
4,000 - 4,999 
5,000 - 5,999 
6,999 
7,999 
8,999 
9,999 
■ 11,999 
12,000 - 14,999 
15,000 - 24,999 
25,000 - 49,999 
50,000 or more 



6,000 - 

7,000 - 

8,000 - 

9,000 - 
10,000 



lout Allocation 

inr 

Number of , 
Families Per Cent 
(1,000s) 



18.7 
1,324.4 
1,749,8 
1,938.3 
2,021.9 
2,307,4 
2,497,7 
2,776.9 
2,952.4 

2,015.0 
5,377.9 
5,709,3 
6,442.0 
1,467.4 
290.3 



WlSlllocation* 



Change 



2.3 
3.3 
4.3 

4.8 
5.0 
5,7 
6.2 
6.8 
7.3 

6.9 
13,2 

14.1 

15.9 

3.6 

0,7 



(3) 
er cf 

Families Per Cent 



1,000s; 

1,276,7 
1,734.3 
2,261,9 
2,501.2 
2,603.3 
2,936.1 
3,148.1 
3,453,4 



13,625,7* 

8,182.6" 
1,974.8 
367,6 



/ 



2.5 
3.4 
4,4 
4.9 

5.1 
5.7 
6.2 
6.7 



7,102.8 13,9* 



26,6< 

16.0' 
3.9 
0.7 



(5) 

(Col, 2 ■ Col. 4) 



,,0,2 
0.1 
0.1 
0,1 
0.1 
0 
0 

41 
-0.3* 

=0,7- 

0.1 
0.3 
0 



Proportionate 
Change 

(6) 

(Col. 5/ Col, 2) 



8.7S 
3,0% 

2.3% 

2.1% 
2.0% 

0% 



1,5% 



•2, 



■2.6% 

0,6% 
8.3% 

0% 



N 

(o 



178 



ERIC 



TABLE 7 (Cont'd) 



IMPACT OF ALLOCATION ON 196? FAMILY 
INCOME DISTRIBUTION 



Proportionate 

Distribution Change Change _ 



— nMiFAlbcatlon~~W : allocation 

(1969 Family Income, " (1) _ '7?H "1>/ W " ( 5 ) ^ 

in dollars) " Number of Number of 

Families Per Cent Families Per Cent (Col. 2- Col. 4) (Col. 5/Col. 2) 

"PET a/ (1,000s) a/ ' 



Total Reported . 40,589,5 100.0 51,168,6 100,0 

Total Not Reported 10,579.1 g 

Per Cent Allocated 20.7 Per Cent b j 

, 0,15 L 
Average 



fl Base is Total Reported, May not add to 100 due to rounding 
b s : iple average of entries, disregarding sign, 



Source: Same as Table 1, ace. Ur: us Tables 03, p. 1-574 and 252, p. 1=923, 



